👋

合成データを使った二値分類モデル評価ガイド

2025/05/14に公開

機械学習のモデル性能を正しく評価するには、代表的な指標（ROC曲線、AUC、F1スコア、混同行列、Precision-Recall曲線など）を理解し、それらを可視化できることが重要です。本記事では、合成データを用いて二値分類評価の流れをハンズオン形式で解説します。

1. 合成データの生成

まずはPythonで極めてシンプルにラベルとスコアを合成します。この例では、100サンプルを不均衡（正例90%、負例10%）に生成し、スコアはクラスごとに正規分布からサンプリングします。

import numpy as np

# サンプル数と正例割合を設定
n_samples = 1000
p_positive = 0.9

# ラベルをBernoulli分布で生成
y_true = np.random.choice([0, 1], size=n_samples,
                          p=[1 - p_positive, p_positive])

# スコアをクラス別に正規分布サンプリング
y_pred = np.empty(n_samples)

# 負例(0): 平均10, 標準偏差5
neg = (y_true == 0)
y_pred[neg] = np.random.normal(10, 5, neg.sum())

# 正例(1): 平均25, 標準偏差5
pos = (y_true == 1)
y_pred[pos] = np.random.normal(25, 5, pos.sum())

Bernoulli分布でラベルをランダム生成
正規分布の平均(loc)・標準偏差(scale)を変えることで、スコアの重なり（難易度）調整可能

2. ROC曲線とAUC

ROC曲線は、しきい値を変化させたときの真陽性率（TPR）と偽陽性率（FPR）の関係を示します。AUCはROC曲線下面積で、モデル性能の総合指標です。

from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

fpr, tpr, thresholds = roc_curve(y_true, y_pred)
auc = roc_auc_score(y_true, y_pred)

plt.plot(fpr, tpr, label=f"AUC = {auc:.3f}")
plt.plot([0,1], [0,1], '--', color='gray')
plt.xlabel("FPR")
plt.ylabel("TPR")
plt.title("ROC Curve")
plt.legend()
plt.grid(True)
plt.show()

roc_curve: FPR, TPR, 閾値を返す
roc_auc_score: AUCを一行で計算

3. F1スコア vs. 閾値

F1スコアは適合率(Precision)と再現率(Recall)の調和平均です。しきい値を変えたときのスコアをプロットすると、最適なしきい値がわかります。

from sklearn.metrics import f1_score

f1_scores = [f1_score(y_true, (y_pred >= t).astype(int)) for t in thresholds]
best_t = thresholds[np.argmax(f1_scores)]

plt.plot(thresholds, f1_scores)
plt.xlabel("Threshold")
plt.ylabel("F1 Score")
plt.title("F1 Score vs Threshold")
plt.grid(True)
plt.show()
print(f"Best Threshold: {best_t:.3f}")

4. 混同行列（Counts & 正規化）

選択したベストしきい値で予測ラベルを二値化し、混同行列を表示します。

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import seaborn as sns

# 二値化
y_bin = (y_pred >= best_t).astype(int)

# カウント表示
cm = confusion_matrix(y_true, y_bin)
ConfusionMatrixDisplay(cm, display_labels=[0,1]).plot(cmap="Blues")

# 正規化（行ごとに100％）
cm_pct = cm.astype(float)
cm_pct /= cm_pct.sum(axis=1, keepdims=True)
cm_pct *= 100
plt.figure(figsize=(5,4))
sns.heatmap(cm_pct, annot=True, fmt=".1f", cmap="Blues",
            xticklabels=[0,1], yticklabels=[0,1])
plt.title("Confusion Matrix (Normalized %)")
plt.show()

行正規化でクラスごとの検出率を百分率表示
ConfusionMatrixDisplay vs. seaborn.heatmap の使い分け

5. Precision-Recall曲線

不均衡データではPR曲線・Average Precision (AP) が有用。特に正例が少ない場合に詳細な性能把握ができます。

from sklearn.metrics import precision_recall_curve, average_precision_score

precision, recall, _ = precision_recall_curve(y_true, y_pred)
ap = average_precision_score(y_true, y_pred)

plt.plot(recall, precision, label=f"AP = {ap:.3f}")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title("Precision-Recall Curve")
plt.legend()
plt.grid(True)
plt.show()

6. ヒストグラム + 最適閾値

クラスごとの予測スコア分布を可視化し、ベストしきい値を縦線で示します。

plt.hist(y_pred[y_true==0], bins="auto", alpha=0.7, label="Class 0")
plt.hist(y_pred[y_true==1], bins="auto", alpha=0.7, label="Class 1")
plt.axvline(best_t, color='red', linestyle='--', label=f"Threshold={best_t:.2f}")
plt.legend()
plt.title("Score Distribution by Class")
plt.show()

まとめ

合成データでも一連の評価手順を踏むことで、モデル診断の流れを把握できる
パラメータ（サンプル数、クラス比、分布形状）を変えれば、難易度・不均衡度を自由にシミュレーション可能
ROC/AUC、F1、混同行列、PR曲線、ヒストグラムを組み合わせて、多角的に性能を評価しよう
ぜひご自分のデータにも応用して、モデルの強み・弱みを発見してみてください！