iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🙂

How to Handle Missing Values

に公開

What are missing values?

A situation where information is not complete within a dataset.

How to check for missing values

You can check for missing values by using the code below, which combines the isna() and sum() functions from the Pandas library.

# Check the number of missing values in each column
missing_values = data.isna().sum()
print(missing_values)

If there are missing values, a number other than 0 will be displayed.
[When missing values exist]

How to fill missing values

There are several ways to fill in missing values.

Forward Fill

Forward fill is a method that uses the value immediately preceding the missing value to fill the gap.

data.ffill(inplace=True)

Backward Fill

Backward fill is a method that uses the value immediately following the missing value to fill the gap.

data.bfill(inplace=True)

Filling with the Mean

For numerical data, you can fill in missing values with the mean of the column.

mean_value = data['column_name'].mean()
data['column_name'].fillna(mean_value, inplace=True)

Filling with the Median

When there are outliers in numerical data, it is recommended to fill the missing values with the median of the column.

mean_value = data['column_name'].median()
data['column_name'].fillna(mean_value, inplace=True)

Troubleshooting when things don't go as planned

Sometimes a single method won't work. For example, if the first row of a dataset is missing, forward fill (ffill) cannot fill the missing value because there is no preceding value. In such cases, you can handle it by combining multiple methods as shown below.

# Attempt forward fill
data.ffill(inplace=True)
# Process missing values that were not filled by forward fill using backward fill
data.bfill(inplace=True)

Deprecated methods

Using the code below will result in an error: "FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead."

data.fillna(method='ffill')

This is because method will be deprecated in future versions. It is best to avoid using it.

Conclusion

Missing values can cause errors. Always check for missing values after performing preprocessing, and if any exist, make sure to fill them using the methods described above.
I hope this article is helpful!

Discussion