Statistics for ML #75 — ROC Curve & AUC

less than 1 minute read

Published:

ROC Curve & AUC

Post #75/100 in the Statistics for ML series — Md Salek MiahStatistician & ML ResearcherSUST, Bangladesh.

The ROC Curve (Receiver Operating Characteristic) plots True Positive Rate (Sensitivity) vs False Positive Rate (1−Specificity) across all classification thresholds.

AUC (Area Under Curve): Probability that the model ranks a random positive higher than a random negative. AUC = 0.5 → random; AUC = 1.0 → perfect.

Interpretation for Public Health ML

In my skilled birth attendance prediction models:

  • AUC = 0.82 → model correctly ranks 82% of (SBA, no-SBA) pairs
  • But AUC alone is insufficient for imbalanced data — also report F1, Precision-Recall AUC
from sklearn.metrics import roc_auc_score, roc_curve, RocCurveDisplay
import matplotlib.pyplot as plt

# For imbalanced DHS data (SBA prediction)
auc = roc_auc_score(y_test, y_proba)
print(f'AUC = {auc:.4f}')

fig, ax = plt.subplots(figsize=(6,6))
RocCurveDisplay.from_predictions(y_test, y_proba, ax=ax)
ax.plot([0,1],[0,1],'k--', label='Random (AUC=0.5)')
ax.set_title(f'ROC Curve — Skilled Birth Attendance Model (AUC={auc:.3f})')
plt.tight_layout(); plt.show()

Series Index | Post #75/100 | Md Salek Miah | saleksta@gmail.com