🐡

【Ensemble】Ranking Explanation

2024/07/20に公開

1. What is the Ranking?

The ranking is a type of normalization, but it specifies the fixed value as output dislikes another popular normalization.

Briefly code:

import pandas as pd

# Sample data
data = {'pred': [0.2, 0.5, 0.8, 0.1, 0.7]}
df = pd.DataFrame(data)

# Rank probabilities (pct means the percentage)
df['rank'] = df['pred'].rank(pct=True)

print(df)

# output
#    pred  rank
# 0   0.2   0.4
# 1   0.5   0.6
# 2   0.8   1.0
# 3   0.1   0.2
# 4   0.7   0.8

This is just the ranking. A pred changed into rank(similar fixed value), and the values calcurated from the number of unique values in original pred. This time, 5 unique values existing, so the fixed values will be [0.2, 0.4, 0.6, 0.8, 1.0].

2. Where does this use

The ranking provide more stability when ensembling many models. This application can be considered from belows.

・Normalization
Simply, the output will squeeze to 0 to 1 range. This is useful when models be ensembled have various range outputs.
・Mitigates Model Bias
Some models might be biased towards certain ranges of probability. Ranking normalizes these biases, making the combined predictions more robust.
・Handles Outliers
Less sensitive to outliers(extreme values) by the procesing method of ranking.
・Uniform Distribution
By converting ranks to percentiles, we achieve a uniform distribution of values. This uniformity can improve the stability and performance of the ensemble.

Summary

Summary, the ranking provide more stable ensembling. please try it!


Discussion