iTranslated by AI
How to Handle Missing Values
What are missing values?
A situation where information is not complete within a dataset.
How to check for missing values
You can check for missing values by using the code below, which combines the isna() and sum() functions from the Pandas library.
# Check the number of missing values in each column
missing_values = data.isna().sum()
print(missing_values)
If there are missing values, a number other than 0 will be displayed.
[When missing values exist]

How to fill missing values
There are several ways to fill in missing values.
Forward Fill
Forward fill is a method that uses the value immediately preceding the missing value to fill the gap.
data.ffill(inplace=True)
Backward Fill
Backward fill is a method that uses the value immediately following the missing value to fill the gap.
data.bfill(inplace=True)
Filling with the Mean
For numerical data, you can fill in missing values with the mean of the column.
mean_value = data['column_name'].mean()
data['column_name'].fillna(mean_value, inplace=True)
Filling with the Median
When there are outliers in numerical data, it is recommended to fill the missing values with the median of the column.
mean_value = data['column_name'].median()
data['column_name'].fillna(mean_value, inplace=True)
Troubleshooting when things don't go as planned
Sometimes a single method won't work. For example, if the first row of a dataset is missing, forward fill (ffill) cannot fill the missing value because there is no preceding value. In such cases, you can handle it by combining multiple methods as shown below.
# Attempt forward fill
data.ffill(inplace=True)
# Process missing values that were not filled by forward fill using backward fill
data.bfill(inplace=True)
Deprecated methods
Using the code below will result in an error: "FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead."
data.fillna(method='ffill')
This is because method will be deprecated in future versions. It is best to avoid using it.
Conclusion
Missing values can cause errors. Always check for missing values after performing preprocessing, and if any exist, make sure to fill them using the methods described above.
I hope this article is helpful!
Discussion