🐸

【ML】What is the Voting Classifier

2024/08/12に公開

1. What is the Voting Classifier?

The voting classifier is used as pipeline tool for training and ensemble of multiple models.
There are 2 types of classifiers 'hard voting' and 'soft voting'.

Voting classifier mechanism with code:

  1. define voting classifier
  2. train each models
  3. predict and ensemble
# define
voting_clf = VotingClassifier(estimators=[
    ('lr', model1), ('dt', model2), ('svc', model3)],
    voting='soft')
# train each models
voting_clf.fit(X_train, y_train)
# predict and ensemble
y_pred = voting_clf.predict(X_test)

1.1 Ensemble type

・Hard Voting
Train multiple models and take majority vote will be conducted from prediction by multiple models.
・Example
ModelA : 1
ModelB : 1
ModelC : 0
The final result is 1.

・Soft Voting
Train multiple models and take simple averaging.
・Example
ModelA : 0.7
ModelB : 0.8
ModelC : 0.4
The final result is (0.7 + 0.8 + 0.4) / 3 = 0.633.

2. Example Code

Here is an example code of voting classifier.

from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize individual models
model1 = LogisticRegression()
model2 = DecisionTreeClassifier()
model3 = SVC(probability=True)

# Initialize Voting Classifier
voting_clf = VotingClassifier(estimators=[
    ('lr', model1), ('dt', model2), ('svc', model3)],
    voting='soft',
    weights=[0.3, 0.3, 0.3])  # Use 'hard' for hard voting

# Train and predict
voting_clf.fit(X_train, y_train)
y_pred = voting_clf.predict(X_test)

# Evaluate accuracy
display(y_test)
display(y_pred)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")

・Output

array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0])array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
       0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
       0])
Accuracy: 1.0
each model's output
Predictions by Logistic Regression: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
 0 0 0 2 1 1 0 0]
Predictions by Decision Tree: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
 0 0 0 2 1 1 0 0]
Predictions by SVM: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
 0 0 0 2 1 1 0 0]

3. Summary

The Voting Classifier is the useful ensemble pipiline.
This can set the weights of each models, please try to use it.

Reference

[1] VotingClassifier, scikit-learn

Discussion