iTranslated by AI

The content below is an AI-generated translation. This is an experimental feature, and may contain errors. View original article
🤔

Kaggle Participation Record V0.5

に公開

Binary Classification with a Bank Dataset V0.5

I have improved the Kaggle analysis code from my previous article (V0.4).

Score Before Improvement:

V0.4
roc-auc : 0.8418812223868644
Kaggle Score : 0.84043

Score After Improvement:

V0.5
roc-auc : 0.8421719373317439 (1-16, 12 deleted),
Kaggle Score : 0.84065

While I am not sure about the magnitude of this growth rate, there was a change in the score, and I succeeded in making an improvement.

Changes

  • Added Features
  1. pdays_is_contacted: Whether they have been contacted at least once in the past
    1: Contacted
    0: Not contacted

  2. balance_is_negative: Whether the balance is negative (in debt)
    1: Negative
    0: Not negative

  3. job_category_quality: Determination of stable income group (management, technician, admin.)
    1: If job is
    'management',
    'technician',
    'admin.'
    0: Otherwise

  4. is_peak_campaign_month: Determination of peak campaign months (May, June, July, August, etc.)
    1: Yes
    0: No

  5. age_bin_senior: Determination of senior group over 60 years old (expected retirement benefit group)
    1: Yes
    0: No

  • Refactored Code Structure

def df_Data_Cleansing(df):

    # Deleted data
    df = df.drop(columns=['<column_to_delete>'])

    # Added data
    df["<added_column>"] = <processing>  #xx
   
    # Vx.x added data
    df["<added_column>"] = <processing>  #xx

    return df

df = df_Data_Cleansing(pd.read_csv('data/train.csv')) 
df_test = df_Data_Cleansing(pd.read_csv('data/test.csv'))  
    

By modularizing the same processing into a function, I reduce the risk of bugs.





submission = pd.DataFrame({
    'id': df_test['id'],
    'y': test_pred
})
submission.to_csv('submission.csv', index=False)
submit_x.to_csv("submit_x_x_x_x_x_x.csv", index=False)

Added output code for a final verification CSV ("submit_x_x_x_x_x_x.csv") separate from the submission CSV ('submission.csv') output.

Discussion