iTranslated by AI
Kaggle Participation Record V0.5
Binary Classification with a Bank Dataset V0.5
I have improved the Kaggle analysis code from my previous article (V0.4).
Score Before Improvement:
V0.4
roc-auc : 0.8418812223868644
Kaggle Score : 0.84043
Score After Improvement:
V0.5
roc-auc : 0.8421719373317439 (1-16, 12 deleted),
Kaggle Score : 0.84065
While I am not sure about the magnitude of this growth rate, there was a change in the score, and I succeeded in making an improvement.
Changes
- Added Features
-
pdays_is_contacted: Whether they have been contacted at least once in the past
1: Contacted
0: Not contacted
-
balance_is_negative: Whether the balance is negative (in debt)
1: Negative
0: Not negative
-
job_category_quality: Determination of stable income group (management, technician, admin.)
1: If job is
'management',
'technician',
'admin.'
0: Otherwise
-
is_peak_campaign_month: Determination of peak campaign months (May, June, July, August, etc.)
1: Yes
0: No
-
age_bin_senior: Determination of senior group over 60 years old (expected retirement benefit group)
1: Yes
0: No
- Refactored Code Structure
def df_Data_Cleansing(df):
# Deleted data
df = df.drop(columns=['<column_to_delete>'])
# Added data
df["<added_column>"] = <processing> #xx
# Vx.x added data
df["<added_column>"] = <processing> #xx
return df
df = df_Data_Cleansing(pd.read_csv('data/train.csv'))
df_test = df_Data_Cleansing(pd.read_csv('data/test.csv'))
By modularizing the same processing into a function, I reduce the risk of bugs.
submission = pd.DataFrame({
'id': df_test['id'],
'y': test_pred
})
submission.to_csv('submission.csv', index=False)
submit_x.to_csv("submit_x_x_x_x_x_x.csv", index=False)
Added output code for a final verification CSV ("submit_x_x_x_x_x_x.csv") separate from the submission CSV ('submission.csv') output.
Discussion