Statistics for ML #7 — Conditional Probability
Published:
Conditional probability is the probability of an event given that another event has occurred. It is perhaps the single most important concept in applied statistics and ML.
Definition
\[P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0\]“The probability of A, given that B has occurred.”
Intuition: Restricting the Sample Space
When we condition on B, we restrict our universe from Ω to B. All probabilities are then re-normalised within B.
Example: In a DHS survey:
- P(no skilled birth attendance) = 0.33
P(no skilled birth attendance rural residence) = 0.48 P(no skilled birth attendance urban residence) = 0.12
Conditioning on location dramatically changes the probability.
Independence vs. Conditional Independence
| Marginal independence: P(A | B) = P(A) — knowing B tells you nothing about A |
| Conditional independence: P(A | B,C) = P(A | C) — given C, knowing B adds nothing about A |
Conditional independence is the foundation of Bayesian Networks and Naive Bayes classifiers.
Confusion Matrix as Conditional Probabilities
| Metric | Formula |
|---|---|
| Sensitivity (Recall) | P(Test+ | Disease+) |
| Specificity | P(Test− | Disease−) |
| PPV (Precision) | P(Disease+ | Test+) |
| NPV | P(Disease− | Test−) |
A COVID test with 95% sensitivity does NOT mean a positive test means 95% chance of having COVID. That’s the base rate fallacy — see Bayes’ Theorem (next post).
from sklearn.metrics import confusion_matrix, classification_report
cm = confusion_matrix(y_true, y_pred)
TP, FP, FN, TN = cm[1,1], cm[0,1], cm[1,0], cm[0,0]
sensitivity = TP / (TP + FN) # P(pred+ | actual+)
specificity = TN / (TN + FP) # P(pred- | actual-)
ppv = TP / (TP + FP) # P(actual+ | pred+)
print(classification_report(y_true, y_pred))
Previous: #6 Probability Axioms | Next: #8 Bayes Theorem
