Statistics for ML #7 — Conditional Probability

1 minute read

Published: January 07, 2026

Conditional probability is the probability of an event given that another event has occurred. It is perhaps the single most important concept in applied statistics and ML.

Definition

\[P(A|B) = \frac{P(A \cap B)}{P(B)}, \quad P(B) > 0\]

“The probability of A, given that B has occurred.”

Intuition: Restricting the Sample Space

When we condition on B, we restrict our universe from Ω to B. All probabilities are then re-normalised within B.

Example: In a DHS survey:

P(no skilled birth attendance) = 0.33
P(no skilled birth attendance rural residence) = 0.48
P(no skilled birth attendance urban residence) = 0.12

Conditioning on location dramatically changes the probability.

Independence vs. Conditional Independence

Marginal independence: P(A

B) = P(A) — knowing B tells you nothing about A

Conditional independence: P(A

B,C) = P(A

C) — given C, knowing B adds nothing about A

Conditional independence is the foundation of Bayesian Networks and Naive Bayes classifiers.

Confusion Matrix as Conditional Probabilities

Metric	Formula
Sensitivity (Recall)	P(Test+ \| Disease+)
Specificity	P(Test− \| Disease−)
PPV (Precision)	P(Disease+ \| Test+)
NPV	P(Disease− \| Test−)

A COVID test with 95% sensitivity does NOT mean a positive test means 95% chance of having COVID. That’s the base rate fallacy — see Bayes’ Theorem (next post).

from sklearn.metrics import confusion_matrix, classification_report

cm = confusion_matrix(y_true, y_pred)
TP, FP, FN, TN = cm[1,1], cm[0,1], cm[1,0], cm[0,0]

sensitivity = TP / (TP + FN)   # P(pred+ | actual+)
specificity = TN / (TN + FP)   # P(pred- | actual-)
ppv = TP / (TP + FP)           # P(actual+ | pred+)

print(classification_report(y_true, y_pred))

Previous: #6 Probability Axioms | Next: #8 Bayes Theorem

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Md Salek Miah

Statistics for ML #7 — Conditional Probability

Definition

Intuition: Restricting the Sample Space

Independence vs. Conditional Independence

Confusion Matrix as Conditional Probabilities

Share on

You May Also Enjoy

Future Blog Post

Statistics for ML #97 — Time Series Analysis: ARIMA, ACF, PACF

Time Series Analysis: ARIMA, ACF, PACF

Statistics for ML #96 — Autoencoders & VAE

Autoencoders & VAE

Statistics for ML #95 — Vanishing & Exploding Gradients

Vanishing & Exploding Gradients