Statistics for ML #75 — ROC Curve & AUC
Published:
ROC Curve & AUC
| Post #75/100 in the Statistics for ML series — Md Salek Miah | Statistician & ML Researcher | SUST, Bangladesh. |
The ROC Curve (Receiver Operating Characteristic) plots True Positive Rate (Sensitivity) vs False Positive Rate (1−Specificity) across all classification thresholds.
AUC (Area Under Curve): Probability that the model ranks a random positive higher than a random negative. AUC = 0.5 → random; AUC = 1.0 → perfect.
Interpretation for Public Health ML
In my skilled birth attendance prediction models:
- AUC = 0.82 → model correctly ranks 82% of (SBA, no-SBA) pairs
- But AUC alone is insufficient for imbalanced data — also report F1, Precision-Recall AUC
from sklearn.metrics import roc_auc_score, roc_curve, RocCurveDisplay
import matplotlib.pyplot as plt
# For imbalanced DHS data (SBA prediction)
auc = roc_auc_score(y_test, y_proba)
print(f'AUC = {auc:.4f}')
fig, ax = plt.subplots(figsize=(6,6))
RocCurveDisplay.from_predictions(y_test, y_proba, ax=ax)
ax.plot([0,1],[0,1],'k--', label='Random (AUC=0.5)')
ax.set_title(f'ROC Curve — Skilled Birth Attendance Model (AUC={auc:.3f})')
plt.tight_layout(); plt.show()
Series Index | Post #75/100 | Md Salek Miah | saleksta@gmail.com
