🐫

【Optimization Method】Optuna Tutorial part3

2024/09/26に公開

This is part 3 of the Optuna tutorial series.

・Part 1

・Part 2
Official Page:

https://optuna.org/

Official Tutorial:

https://optuna.readthedocs.io/en/stable/tutorial/index.html

 3.0 Review the basic codeimport optuna

def objective(trial):
    x = trial.suggest_float("x", -10, 10)
    return (x - 2) ** 2

study = optuna.create_study()
study.optimize(objective, n_trials=100)

best_params = study.best_params
found_x = best_params["x"]
print("Found x: {}, (x - 2)^2: {}".format(found_x, (found_x - 2) ** 2))

 3.1 Sampling AlgorithmsOptuna provides many algorithms for parameter optimization, each method has each feature so we have to choose appropriately. (But you can try all of them if you have certain test times)

More detailed explanation of how samplers suggest parameters is in BaseSampler.
Optuna provides the following sampling algorithms:

・Grid Search implemented in GridSampler

・Random Search implemented in RandomSampler

・Tree-structured Parzen Estimator algorithm implemented in TPESampler

・CMA-ES based algorithm implemented in CmaEsSampler

・Gaussian process-based algorithm implemented in GPSampler

・Algorithm to enable partial fixed parameters implemented in PartialFixedSampler

・Nondominated Sorting Genetic Algorithm II implemented in NSGAIISampler

・A Quasi Monte Carlo sampling algorithm implemented in QMCSampler
The default sampler is TPESampler.

 3.1.1 Switching Sampler・Switching Sampler
import optuna

study = optuna.create_study() # default
print(f"Sampler is {study.sampler.__class__.__name__}")

study = optuna.create_study(sampler=optuna.samplers.RandomSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")

study = optuna.create_study(sampler=optuna.samplers.CmaEsSampler())
print(f"Sampler is {study.sampler.__class__.__name__}")
・Result
[I 2024-09-26 03:04:53,090] A new study created in memory with name: no-name
[I 2024-09-26 03:04:53,091] A new study created in memory with name: no-name
[I 2024-09-26 03:04:53,091] A new study created in memory with name: no-name
Sampler is TPESampler
Sampler is RandomSampler
Sampler is CmaEsSampler

 3.2 Pruning AlgorithmsPruners automatically stop unpromising trials at the early stages of the training (a.k.a., automated early-stopping). Currently pruners module is expected to be used only for single-objective optimization.
Optuna provides the following pruning algorithms:

・Median pruning algorithm implemented in MedianPruner

・Non-pruning algorithm implemented in NopPruner

・Algorithm to operate pruner with tolerance implemented in PatientPruner

・Algorithm to prune specified percentile of trials implemented in PercentilePruner

・Asynchronous Successive Halving algorithm implemented in SuccessiveHalvingPruner

・Hyperband algorithm implemented in HyperbandPruner

・Threshold pruning algorithm implemented in ThresholdPruner

・A pruning algorithm based on Wilcoxon signed-rank test implemented in WilcoxonPruner
The Optuna example codes using MedianPruner in most cases. Basically, it is outperformed by SuccessiveHalvingPruner and HyperbandPruner as in this benchmark result.

 3.2.1 Activating PrunersTo turn on the pruning feature, you need to call report() and should_prune() after each step of the iterative training. report() periodically monitors the intermediate objective values. should_prune() decides termination of the trial that does not meet a predefined condition.
・Example Prunner Code
import logging
import sys

import sklearn.datasets
import sklearn.linear_model
import sklearn.model_selection


def objective(trial):
    iris = sklearn.datasets.load_iris()
    classes = list(set(iris.target))
    train_x, valid_x, train_y, valid_y = sklearn.model_selection.train_test_split(
        iris.data, iris.target, test_size=0.25, random_state=0
    )

    alpha = trial.suggest_float("alpha", 1e-5, 1e-1, log=True)
    clf = sklearn.linear_model.SGDClassifier(alpha=alpha)

    for step in range(100):
        clf.partial_fit(train_x, train_y, classes=classes)

        # Report intermediate objective value.
        intermediate_value = 1.0 - clf.score(valid_x, valid_y)
        trial.report(intermediate_value, step)

        # Handle pruning based on the intermediate value.
        if trial.should_prune():
            raise optuna.TrialPruned()

    return 1.0 - clf.score(valid_x, valid_y)

# Add stream handler of stdout to show the messages
optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))
study = optuna.create_study(pruner=optuna.pruners.MedianPruner())
study.optimize(objective, n_trials=20)
・Result
[I 2024-09-26 10:47:43,878] A new study created in memory with name: no-name
[I 2024-09-26 10:47:43,961] Trial 0 finished with value: 0.3421052631578947 and parameters: {'alpha': 0.0004612327672225534}. Best is trial 0 with value: 0.3421052631578947.
[I 2024-09-26 10:47:44,041] Trial 1 finished with value: 0.052631578947368474 and parameters: {'alpha': 4.0761044211755606e-05}. Best is trial 1 with value: 0.052631578947368474.
[I 2024-09-26 10:47:44,114] Trial 2 finished with value: 0.1842105263157895 and parameters: {'alpha': 0.002963224994149077}. Best is trial 1 with value: 0.052631578947368474.
[I 2024-09-26 10:47:44,188] Trial 3 finished with value: 0.3157894736842105 and parameters: {'alpha': 0.00029443834676040354}. Best is trial 1 with value: 0.052631578947368474.
[I 2024-09-26 10:47:44,261] Trial 4 finished with value: 0.368421052631579 and parameters: {'alpha': 0.0005968789848913645}. Best is trial 1 with value: 0.052631578947368474.
[I 2024-09-26 10:47:44,322] Trial 5 pruned. 
[I 2024-09-26 10:47:44,403] Trial 6 finished with value: 0.10526315789473684 and parameters: {'alpha': 0.0002549952615034128}. Best is trial 1 with value: 0.052631578947368474.
[I 2024-09-26 10:47:44,407] Trial 7 pruned. 
[I 2024-09-26 10:47:44,411] Trial 8 pruned. 
[I 2024-09-26 10:47:44,414] Trial 9 pruned. 
[I 2024-09-26 10:47:44,420] Trial 10 pruned. 
[I 2024-09-26 10:47:44,425] Trial 11 pruned. 
[I 2024-09-26 10:47:44,509] Trial 12 finished with value: 0.42105263157894735 and parameters: {'alpha': 9.034472208694208e-05}. Best is trial 1 with value: 0.052631578947368474.
[I 2024-09-26 10:47:44,592] Trial 13 finished with value: 0.1842105263157895 and parameters: {'alpha': 7.208212462168948e-05}. Best is trial 1 with value: 0.052631578947368474.
[I 2024-09-26 10:47:44,600] Trial 14 pruned. 
[I 2024-09-26 10:47:44,609] Trial 15 pruned. 
[I 2024-09-26 10:47:44,614] Trial 16 pruned. 
[I 2024-09-26 10:47:44,698] Trial 17 finished with value: 0.13157894736842102 and parameters: {'alpha': 0.00024609691733570106}. Best is trial 1 with value: 0.052631578947368474.
[I 2024-09-26 10:47:44,705] Trial 18 pruned. 
[I 2024-09-26 10:47:44,788] Trial 19 finished with value: 0.07894736842105265 and parameters: {'alpha': 0.0008168248767813228}. Best is trial 1 with value: 0.052631578947368474.
Like this, some unpromising processes are pruncated in the early step for efficiency.

 3.3 Which Sampler and Pruner Should be Used?・Not deep learning tasksFrom this benchmark:

・For RandomSampler, MedianPruner is the best.

・For TPESampler, HyperbandPruner is the best.
・Deep learning tasksFrom this book:


Parallel Compute Resource
Categorical/Conditional Hyperparameters
Recommended Algorithms


Limited
No
TPE, GP-EI if search space is low-dimensional and continuous.

Limited
Yes
TPE, GP-EI if search space is low-dimensional and continuous

Sufficient
No
CMA-ES, Random Search

Sufficient
Yes
Random Search or Genetic Algorithm


 3.3 IntegrationTo implement pruning mechanism in much simpler forms, Optuna provides integration modules for the following libraries.


Integration
Dependencies


AllenNLP
allennlp, torch, psutil, jsonnet

BoTorch
botorch, gpytorch, torch

CatBoost
catboost

ChainerMN
chainermn

Chainer
chainer

pycma
cma

Dask
distributed

FastAI
fastai

Keras
keras

LightGBMTuner
lightgbm, scikit-learn

LightGBMPruningCallback
lightgbm

MLflow
mlflow

MXNet
mxnet

PyTorch Distributed
torch

PyTorch (Ignite)
pytorch-ignite

PyTorch (Lightning)
pytorch-lightning

SHAP
scikit-learn, shap

Scikit-learn
pandas, scipy, scikit-learn

SKorch
skorch

TensorBoard
tensorboard, tensorflow

TensorFlow
tensorflow, tensorflow-estimator

TensorFlow + Keras
tensorflow

Weights & Biases
wandb

XGBoost
xgboost

We can check each implementation at the optuna example codes.

For example, the lightgbm version is here:
import optuna.integration

pruning_callback = optuna.integration.LightGBMPruningCallback(trial, 'validation-error')
gbm = lgb.train(param, dtrain, valid_sets=[dvalid], callbacks=[pruning_callback])

 3.4 SummaryThis time, I explained the basics and implementation of the below.

・The algorithms

・The pruner
Please choose an appropriate method for your task.

Parallel Compute Resource	Categorical/Conditional Hyperparameters	Recommended Algorithms
Limited	No	TPE, GP-EI if search space is low-dimensional and continuous.
Limited	Yes	TPE, GP-EI if search space is low-dimensional and continuous
Sufficient	No	CMA-ES, Random Search
Sufficient	Yes	Random Search or Genetic Algorithm

Integration	Dependencies
AllenNLP	allennlp, torch, psutil, jsonnet
BoTorch	botorch, gpytorch, torch
CatBoost	catboost
ChainerMN	chainermn
Chainer	chainer
pycma	cma
Dask	distributed
FastAI	fastai
Keras	keras
LightGBMTuner	lightgbm, scikit-learn
LightGBMPruningCallback	lightgbm
MLflow	mlflow
MXNet	mxnet
PyTorch Distributed	torch
PyTorch (Ignite)	pytorch-ignite
PyTorch (Lightning)	pytorch-lightning
SHAP	scikit-learn, shap
Scikit-learn	pandas, scipy, scikit-learn
SKorch	skorch
TensorBoard	tensorboard, tensorflow
TensorFlow	tensorflow, tensorflow-estimator
TensorFlow + Keras	tensorflow
Weights & Biases	wandb
XGBoost	xgboost

3.0 Review the basic code

3.1 Sampling Algorithms

3.1.1 Switching Sampler

3.2 Pruning Algorithms

3.2.1 Activating Pruners

3.3 Which Sampler and Pruner Should be Used?

・Not deep learning tasks

・Deep learning tasks

3.3 Integration

3.4 Summary

Discussion