iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🙂

Encoding Use Case: Converting Column Values to Numeric and Adding Values Based on Conditions

に公開

In this article, I will show you how to convert values in a specific column to numeric values using Pandas and Scikit-learn's OrdinalEncoder.

As an example, we will convert the 'locale' column in the train_df DataFrame. The 'locale' column contains values such as "national", "regional", "local", and NaN.

We will convert this column to numeric values using OrdinalEncoder. After the conversion, we will display the categories as a list.

from sklearn.preprocessing import OrdinalEncoder
encoder = OrdinalEncoder()
train_df['locale'] = encoder.fit_transform(train_df[['locale']])
list(zip(encoder.categories_[0], range(len(encoder.categories_[0]))))

When you run the code, the original values and their converted numeric counterparts are listed as follows:

[Result Example]
[('Local', 0), ('National', 1), ('Regional', 2), (nan, 3)]

Next, we will process the data based on certain conditions.
In this case, we will add a 'festival' column to train_df and add a number to the 'festival' column depending on the value of 'locale'. For example, if the 'locale' is 0 (corresponding to 'Local'), we will add 1 to the 'festival' column.

train_df['festival'] = 0
for index, row in train_df.iterrows():
    if row['locale'] == 0:
        train_df.at[index, 'festival'] += 1

[Result Example]

By performing encoding, you can convert text data into numeric data, making it easier to process.

Discussion