Statistics for ML #7 — Conditional Probability

1 minute read

Published:

Conditional probability is the probability of an event given that another event has occurred. It is perhaps the single most important concept in applied statistics and ML.

Definition

\[P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0\]

“The probability of A, given that B has occurred.”

Intuition: Restricting the Sample Space

When we condition on B, we restrict our universe from Ω to B. All probabilities are then re-normalised within B.

Example: In a DHS survey:

  • P(no skilled birth attendance) = 0.33
  • P(no skilled birth attendancerural residence) = 0.48
  • P(no skilled birth attendanceurban residence) = 0.12

Conditioning on location dramatically changes the probability.

Independence vs. Conditional Independence

Marginal independence: P(AB) = P(A) — knowing B tells you nothing about A
Conditional independence: P(AB,C) = P(AC) — given C, knowing B adds nothing about A

Conditional independence is the foundation of Bayesian Networks and Naive Bayes classifiers.

Confusion Matrix as Conditional Probabilities

MetricFormula
Sensitivity (Recall)P(Test+ | Disease+)
SpecificityP(Test− | Disease−)
PPV (Precision)P(Disease+ | Test+)
NPVP(Disease− | Test−)

A COVID test with 95% sensitivity does NOT mean a positive test means 95% chance of having COVID. That’s the base rate fallacy — see Bayes’ Theorem (next post).

from sklearn.metrics import confusion_matrix, classification_report

cm = confusion_matrix(y_true, y_pred)
TP, FP, FN, TN = cm[1,1], cm[0,1], cm[1,0], cm[0,0]

sensitivity = TP / (TP + FN)   # P(pred+ | actual+)
specificity = TN / (TN + FP)   # P(pred- | actual-)
ppv = TP / (TP + FP)           # P(actual+ | pred+)

print(classification_report(y_true, y_pred))

Previous: #6 Probability Axioms | Next: #8 Bayes Theorem