iTranslated by AI
Handling Missing Values in Pandas
Introduction
There are times when you want to fill missing values in Pandas using values from other columns.
In addition to that method, I'm posting this as a memorandum, including a pattern where I
took a detour because I couldn't think of the right way at the time.
Sample Pattern
In the following DataFrame, let's consider filling the missing values in col2 with the value
of the date in col1 plus 3 days.
import pandas as pd
import datetime
df = pd.DataFrame({'col1': ["2021/05/08", "2021/05/08", "2021/05/08"],
'col2': ["2021/05/09", "2021/05/11", None]},
index=['row1', 'row2', 'row3'])
At this point, assume that each column has been converted to datetime as follows:
df["col1"] = pd.to_datetime(df["col1"])
df["col2"] = pd.to_datetime(df["col2"])
1: fillna
df["col2"].fillna(df["col1"]+ datetime.timedelta(days=3), inplace=True)
When it comes to missing value imputation, fillna is the standard.
While this solves the issue, I didn't think of it at the time.
2: lambda function
df["col2"] = df.apply(lambda x: x["col1"]+ datetime.timedelta(days=3) if pd.isnull(x["col2"]) else x["col2"] ,axis=1)
This is the method I used when I couldn't think of fillna.
I performed the imputation using a combination of a lambda function and a ternary operator.
3: mask function
df["col2"].mask(pd.isnull(df["col2"]),df["col1"]+ datetime.timedelta(days=3), inplace=True)
This is the method I thought of after method 2.
This allows you to complete the task without having to write a lambda function.
Why I took a detour
Initially, I tried to perform the imputation by creating a separate function with def,
so I could only think of using the apply function. Method number 2 is the result of that trial and error.
As it turned out, I didn't need to create a function after all,
so looking back, the punchline is that it could have been solved with method number 1.
Conclusion
Although my efforts in this case ended up being a roundabout way,
for instance, if you want to change values when col2 is not a missing value but a specific value,
you can perform the conversion in a single line using method number 3.
Also, if you want to perform complex processing, method number 2 would be a candidate for adoption by creating a function.
The way you want to transform data changes depending on the situation.
I summarized this because searching for it every time is a waste of time.
I wrote this as a memorandum for myself, but I hope it helps someone.
Discussion