🙆‍♀️

LightGBMのcallbacksを使う(early_stopping_roundsとverbose)

2023/01/28に公開

サマリー

LightGBMのcallbacksを使えWarningに対応した。
今回はearly_stopping_roundsとverboseのみ。
結論として、lgbの学習中に以下のoptionを与えてあげればOK

callbacks=[
    lgb.early_stopping(stopping_rounds=50, verbose=True),
    lgb.log_evaluation(100),
],

公式Docsは以下

https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.early_stopping.html

https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.log_evaluation.html


でてきたWarning

/opt/conda/lib/python3.7/site-packages/lightgbm/sklearn.py:726: UserWarning: 'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. Pass 'early_stopping()' callback via 'callbacks' argument instead.
  _log_warning("'early_stopping_rounds' argument is deprecated and will be removed in a future release of LightGBM. "
/opt/conda/lib/python3.7/site-packages/lightgbm/sklearn.py:736: UserWarning: 'verbose' argument is deprecated and will be removed in a future release of LightGBM. Pass 'log_evaluation()' callback via 'callbacks' argument instead.
  _log_warning("'verbose' argument is deprecated and will be removed in a future release of LightGBM. "

ソースコード

カリフォルニア住宅データで回帰を行っています。

https://zenn.dev/nishimoto/articles/e9ee95ca0c9c95

import numpy as np
import pandas as pd
import lightgbm as lgb # 3.3.2
from IPython.display import display
from sklearn.model_selection import KFold
from sklearn.datasets import fetch_california_housing

california_housing = fetch_california_housing()
train_x = pd.DataFrame(california_housing.data, columns=california_housing.feature_names)
train_y = pd.Series(california_housing.target)
display(train_x.head())
display(train_y.head())

params = {
    'n_estimators': 2000,
    'boosting_type': 'gbdt',
    'learning_rate': 0.01,
    'metric': 'rmse',
    'colsample_bytree': 0.8,
    'seed': 0,
}

cv = KFold(n_splits=3)
for fold, (trn_idx, val_idx) in enumerate(cv.split(train_x), start=0):
    trn_x = train_x.iloc[trn_idx, :]
    trn_y = train_y[trn_idx]
    val_x = train_x.iloc[val_idx, :]
    val_y = train_y[val_idx]
    
    model = lgb.LGBMRegressor(**params)
    model.fit(
        trn_x, trn_y,
        eval_set=[(val_x, val_y)],
        callbacks=[
            lgb.early_stopping(stopping_rounds=50, verbose=True),
            lgb.log_evaluation(100),
        ],
    )
    print(model.best_score_["valid_0"]["rmse"])

# アウトプット

想定どおり、50回のearly_stoppingと100回ごとのverboseができた。

Training until validation scores don't improve for 50 rounds
[100]	valid_0's rmse: 0.794887
[200]	valid_0's rmse: 0.709633
[300]	valid_0's rmse: 0.695878
[400]	valid_0's rmse: 0.688755
Early stopping, best iteration is:
[382]	valid_0's rmse: 0.687713
0.6877131846327913
Training until validation scores don't improve for 50 rounds
[100]	valid_0's rmse: 0.711681
[200]	valid_0's rmse: 0.599219
[300]	valid_0's rmse: 0.569275
[400]	valid_0's rmse: 0.558568
[500]	valid_0's rmse: 0.551141
[600]	valid_0's rmse: 0.546123
[700]	valid_0's rmse: 0.542984
Early stopping, best iteration is:
[716]	valid_0's rmse: 0.542826
0.5428263212084733
Training until validation scores don't improve for 50 rounds
[100]	valid_0's rmse: 0.874971
[200]	valid_0's rmse: 0.782944
[300]	valid_0's rmse: 0.75143
[400]	valid_0's rmse: 0.729386
[500]	valid_0's rmse: 0.716699
[600]	valid_0's rmse: 0.707987
[700]	valid_0's rmse: 0.702348
[800]	valid_0's rmse: 0.70097
[900]	valid_0's rmse: 0.699974
Early stopping, best iteration is:
[859]	valid_0's rmse: 0.6999
0.6999003302697722

Discussion