🐸
【ML】What is the Voting Classifier
1. What is the Voting Classifier?
The voting classifier is used as pipeline tool for training and ensemble of multiple models.
There are 2 types of classifiers 'hard voting' and 'soft voting'.
Voting classifier mechanism with code:
- define voting classifier
- train each models
- predict and ensemble
# define
voting_clf = VotingClassifier(estimators=[
('lr', model1), ('dt', model2), ('svc', model3)],
voting='soft')
# train each models
voting_clf.fit(X_train, y_train)
# predict and ensemble
y_pred = voting_clf.predict(X_test)
1.1 Ensemble type
・Hard Voting
Train multiple models and take majority vote will be conducted from prediction by multiple models.
・Example
ModelA : 1
ModelB : 1
ModelC : 0
The final result is 1.
・Soft Voting
Train multiple models and take simple averaging.
・Example
ModelA : 0.7
ModelB : 0.8
ModelC : 0.4
The final result is (0.7 + 0.8 + 0.4) / 3 = 0.633.
2. Example Code
Here is an example code of voting classifier.
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load dataset
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Initialize individual models
model1 = LogisticRegression()
model2 = DecisionTreeClassifier()
model3 = SVC(probability=True)
# Initialize Voting Classifier
voting_clf = VotingClassifier(estimators=[
('lr', model1), ('dt', model2), ('svc', model3)],
voting='soft',
weights=[0.3, 0.3, 0.3]) # Use 'hard' for hard voting
# Train and predict
voting_clf.fit(X_train, y_train)
y_pred = voting_clf.predict(X_test)
# Evaluate accuracy
display(y_test)
display(y_pred)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
・Output
array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
0])array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,
0, 2, 2, 2, 2, 2, 0, 0, 0, 0, 1, 0, 0, 2, 1, 0, 0, 0, 2, 1, 1, 0,
0])
Accuracy: 1.0
each model's output
Predictions by Logistic Regression: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0 0 0 2 1 1 0 0]
Predictions by Decision Tree: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0 0 0 2 1 1 0 0]
Predictions by SVM: [1 0 2 1 1 0 1 2 1 1 2 0 0 0 0 1 2 1 1 2 0 2 0 2 2 2 2 2 0 0 0 0 1 0 0 2 1
0 0 0 2 1 1 0 0]
3. Summary
The Voting Classifier is the useful ensemble pipiline.
This can set the weights of each models, please try to use it.
Reference
[1] VotingClassifier, scikit-learn
Discussion