🎐

【Optuna】How to use Optuna (with XGB)

2024/06/27に公開

This time, we think about hyper parameter optimization of XGBoost by Optuna.
So let we go.

1. Normal XGB

This time, we try to predict of wine quality. Normal XGB code is this:
・Wine quality prediction

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
import xgboost as xgb

# Load dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
data = pd.read_csv(url, sep=';')

print("Missing values")
print(data.isnull().sum())

X = data.drop('quality', axis=1)
y = data['quality']

# Binarize the output (problem be more simply)
y = y.apply(lambda x: 1 if x >= 7 else 0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Train
model = xgb.XGBClassifier(use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(classification_report(y_test, y_pred))

・Result

# Accuracy: 0.915625
#               precision    recall  f1-score   support

#            0       0.94      0.97      0.95       273
#            1       0.76      0.62      0.68        47

#     accuracy                           0.92       320
#    macro avg       0.85      0.79      0.82       320
# weighted avg       0.91      0.92      0.91       320

Showing the each result.
TH eaccuracy is about 0.915.

0 and 1 means which is the answer of prediction problem. From above, it seems difficult to predict 1(number of answer 1 is low, so easy to decrease the precision score).

2. Parameter Tuned

In here, I optimize accuracy, so feed function(objective) that return accuracy.

Here is the flow of tuning by optuna:

  1. Define an Objective Function:
    ・Receives a trial object.
    ・Defines a set of hyperparameters to be optimized(define as param inhere, but the name isn't fixed).
    ・Trains a model with these hyperparameters(the params if you wanna search, have to specify as like trial.suggest_datatype).
    ・Evaluates the model and returns the performance metric.

    tips of log=true in params

    log=true in params means sampling with log scale not linear scale.

    For example:
    ・Without log=True, trial.suggest_float('learning_rate', 0.01, 0.1) would sample values like 0.02, 0.05, 0.07, etc., on a linear scale.
    ・With log=True, trial.suggest_float('learning_rate', 0.01, 0.1, log=True) would sample values like 0.01, 0.02, 0.05, 0.1, etc., on a logarithmic scale.

  2. Create study like study = optuna.create_study(direction='maximize')
    A study in Optuna is an optimization task. It manages the trials and optimizes the objective function. You can specify the direction of optimization (e.g., maximize accuracy or minimize log loss).

  3. Optimize by the study
    Execute optimize as like study.optimize(objective, n_trials=50).
    It executes the objective function multiple times with different sets of hyperparameters, guided by the TPE sampler that efficient optimization method.

  4. Extract the Best Hyperparameters:
    After the optimization process, the best set of hyperparameters can be retrieved from the study like best_params = study.best_trial.params.

  5. Train the Final Model:
    Use the best hyperparameters to train the final model on the entire training dataset.
    Like this:

    best_model = xgb.XGBClassifier(**best_params)
    best_model.fit(X_train, y_train)
    y_pred = best_model.predict(X_test)
    

Here is the code with optimization:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report
import xgboost as xgb
import optuna

# Load dataset
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
data = pd.read_csv(url, sep=';')

# Check for missing values
print("Missing values")
print(data.isnull().sum())

# Separate features and target
X = data.drop('quality', axis=1)
y = data['quality']

# Binarize the output (for simplicity, let's predict if the wine is good or bad)
y = y.apply(lambda x: 1 if x >= 7 else 0)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

def objective(trial):
    param = {
        'booster': 'gbtree',
        'objective': 'binary:logistic',
        'eval_metric': 'logloss',
        'use_label_encoder': False,
        'lambda': trial.suggest_float('lambda', 1e-3, 10.0, log=True),
        'alpha': trial.suggest_float('alpha', 1e-3, 10.0, log=True),
        'colsample_bytree': trial.suggest_categorical('colsample_bytree', [0.5, 0.7, 0.9, 1.0]),
        'subsample': trial.suggest_categorical('subsample', [0.5, 0.7, 0.9, 1.0]),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.1, log=True),
        'n_estimators': trial.suggest_int('n_estimators', 100, 1000),
        'max_depth': trial.suggest_int('max_depth', 3, 10),
        'min_child_weight': trial.suggest_int('min_child_weight', 1, 10)
    }
    
    model = xgb.XGBClassifier(**param)
    model.fit(X_train, y_train)
    
    preds = model.predict(X_test)
    accuracy = accuracy_score(y_test, preds)
    return accuracy

study = optuna.create_study(direction='maximize')
study.optimize(objective, n_trials=50)

print('Number of finished trials:', len(study.trials))
print('Best trial:', study.best_trial.params)

# Train final model with best parameters
best_params = study.best_trial.params
best_model = xgb.XGBClassifier(**best_params)
best_model.fit(X_train, y_train)

# Make predictions
y_pred = best_model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy}')
print(classification_report(y_test, y_pred))

・Result

# Number of finished trials: 50
# Best trial: {'lambda': 0.07280666976502195, 'alpha': 0.20462776346771241, 'colsample_bytree': 0.9, 'subsample': 1.0, 'learning_rate': 0.044485631291397806, 'n_estimators': 798, 'max_depth': 9, 'min_child_weight': 1}
# Accuracy: 0.91875
#               precision    recall  f1-score   support

#            0       0.94      0.97      0.95       273
#            1       0.77      0.64      0.70        47

#     accuracy                           0.92       320
#    macro avg       0.85      0.80      0.83       320
# weighted avg       0.91      0.92      0.92       320

The accuracy raise up to 0.91875 from 0.915625!
This is a little improvement, but it can raise the score auto matically, and more effective for the tasks where hyperparameters are poorly optimized.

By the way, we archieved to improve the score by optuna.

3. Summary

This time, I explained how to use optuna briefly.
I explained with focus on XGB, but of course optuna can be used for other machine learning architecture that have hyper parameters.

Please try it with another GBDT, SVB, NN, etc.

Discussion