🐸

2024/08/12に公開

# 1. What is the Voting Classifier?

The voting classifier is used as pipeline tool for training and ensemble of multiple models.
There are 2 types of classifiers 'hard voting' and 'soft voting'.

Voting classifier mechanism with code:

1. define voting classifier
2. train each models
3. predict and ensemble
``````# define
voting_clf = VotingClassifier(estimators=[
('lr', model1), ('dt', model2), ('svc', model3)],
voting='soft')
# train each models
voting_clf.fit(X_train, y_train)
# predict and ensemble
y_pred = voting_clf.predict(X_test)
``````

### 1.1 Ensemble type

・Hard Voting
Train multiple models and take majority vote will be conducted from prediction by multiple models.
・Example
ModelA : 1
ModelB : 1
ModelC : 0
The final result is 1.

・Soft Voting
Train multiple models and take simple averaging.
・Example
ModelA : 0.7
ModelB : 0.8
ModelC : 0.4
The final result is (0.7 + 0.8 + 0.4) / 3 = 0.633.

# 2. Example Code

Here is an example code of voting classifier.

``````from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Initialize individual models
model1 = LogisticRegression()
model2 = DecisionTreeClassifier()
model3 = SVC(probability=True)

# Initialize Voting Classifier
voting_clf = VotingClassifier(estimators=[
('lr', model1), ('dt', model2), ('svc', model3)],
voting='soft',
weights=[0.3, 0.3, 0.3])  # Use 'hard' for hard voting

# Train and predict
voting_clf.fit(X_train, y_train)
y_pred = voting_clf.predict(X_test)

# Evaluate accuracy
display(y_test)
display(y_pred)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
``````

・Output

``````array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
0])array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
0])
Accuracy: 1.0
``````
each model's output
``````Predictions by Logistic Regression: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0 0 0 2 1 1 0 0]
Predictions by Decision Tree: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0 0 0 2 1 1 0 0]
Predictions by SVM: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0 0 0 2 1 1 0 0]
``````

# 3. Summary

The Voting Classifier is the useful ensemble pipiline.
This can set the weights of each models, please try to use it.

# Reference

[1] VotingClassifier, scikit-learn