iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🐼

Handling Missing Values in Pandas

に公開

Introduction

There are times when you want to fill missing values in Pandas using values from other columns.
In addition to that method, I'm posting this as a memorandum, including a pattern where I
took a detour because I couldn't think of the right way at the time.

Sample Pattern

In the following DataFrame, let's consider filling the missing values in col2 with the value
of the date in col1 plus 3 days.

import pandas as pd
import datetime

df = pd.DataFrame({'col1': ["2021/05/08", "2021/05/08", "2021/05/08"], 
'col2': ["2021/05/09", "2021/05/11", None]},
index=['row1', 'row2', 'row3'])

At this point, assume that each column has been converted to datetime as follows:

df["col1"] = pd.to_datetime(df["col1"])
df["col2"] = pd.to_datetime(df["col2"])

1: fillna

df["col2"].fillna(df["col1"]+ datetime.timedelta(days=3), inplace=True)

When it comes to missing value imputation, fillna is the standard.
While this solves the issue, I didn't think of it at the time.

2: lambda function

df["col2"] = df.apply(lambda x: x["col1"]+ datetime.timedelta(days=3)  if pd.isnull(x["col2"]) else x["col2"]  ,axis=1)

This is the method I used when I couldn't think of fillna.
I performed the imputation using a combination of a lambda function and a ternary operator.

3: mask function

df["col2"].mask(pd.isnull(df["col2"]),df["col1"]+ datetime.timedelta(days=3), inplace=True)

This is the method I thought of after method 2.
This allows you to complete the task without having to write a lambda function.

Why I took a detour

Initially, I tried to perform the imputation by creating a separate function with def,
so I could only think of using the apply function. Method number 2 is the result of that trial and error.
As it turned out, I didn't need to create a function after all,
so looking back, the punchline is that it could have been solved with method number 1.

Conclusion

Although my efforts in this case ended up being a roundabout way,
for instance, if you want to change values when col2 is not a missing value but a specific value,
you can perform the conversion in a single line using method number 3.
Also, if you want to perform complex processing, method number 2 would be a candidate for adoption by creating a function.
The way you want to transform data changes depending on the situation.
I summarized this because searching for it every time is a waste of time.
I wrote this as a memorandum for myself, but I hope it helps someone.

Discussion