STEP 8 / 17

Logistic 迴歸與勝算比

病例—對照、預測生物標記、分類器——所有二元結果分析的基石。但 OR ≠ RR,是看似簡單實則陷阱重重的模型。

Case–control studies, predictive biomarkers, classifier evaluation — the foundation of binary outcome analysis. Yet OR ≠ RR, and the pitfalls are many.

為什麼要 logit link?

二元結果(patient / control、突變 / 野生型、響應 / 無響應)若直接 OLS 擬合,預測值可能 < 0 或 > 1——既不合理也違反同質變異。Logistic 迴歸用 logit link 把 (0, 1) 機率對映到 (−∞, ∞):

logit(p) = log(p/(1−p)) = β₀ + β₁x₁ + ... + βkxk

係數 β 是對數勝算 (log-odds)exp(β)勝算比 (odds ratio, OR)。MLE 透過 IRLS(迭代加權最小平方)求解;推論可用 Wald、Score、Likelihood Ratio 三種檢定。

Binary outcomes (case/control, mutant/WT, responder/non) cannot be sensibly fit by OLS — predicted values escape (0, 1) and homoscedasticity fails. Logistic regression uses the logit link to map probabilities in (0, 1) to the real line:

logit(p) = log(p/(1−p)) = β₀ + β₁x₁ + ... + βkxk

The β coefficients are log-odds; exp(β) is the odds ratio (OR). MLE proceeds by IRLS (iteratively reweighted least squares). Inference comes from Wald, Score, or Likelihood Ratio tests.

💡
核心警告:OR ≠ RR。勝算比與相對風險只有在「事件罕見」(baseline p < ~10%)時才近似相等。在常見事件(response 率 40%)OR 會嚴重誇大真正的 RR。要報告 RR,就用 log-binomialPoisson + robust SE,不要硬把 OR 當 RR 講。 Core warning: OR ≠ RR. Odds ratio approximates relative risk only when events are rare (baseline p < ~10%). For common outcomes (response 40%), OR overstates the true RR. To report RR, use log-binomial or Poisson with robust SE — never paraphrase OR as RR.

一、模型構成

📈

係數與 OR

  • β₁ = x₁ 每增加 1 單位的對數勝算變化
  • OR = exp(β₁);OR > 1 → 風險升高;OR = 1 → 無關
  • 95% CI:exp(β̂ ± 1.96·SE)——在對數尺度對稱
  • 類別變數:每 level 相對於 reference 的 OR
  • β₁ = change in log-odds per unit of x₁
  • OR = exp(β₁); OR > 1 → higher risk; OR = 1 → null
  • 95% CI: exp(β̂ ± 1.96·SE) — symmetric on the log scale
  • Categorical: OR of each level vs reference
🧮

估計與檢定

  • MLE via Newton-Raphson / IRLS
  • Wald:β̂/SE → N(0,1)(小樣本不穩)
  • LR:−2(logL₀ − logL₁) → χ²(首選)
  • Deviance:D = −2·logL;殘差類似 RSS
  • Pseudo-R²:McFadden、Nagelkerke
  • MLE via Newton-Raphson / IRLS
  • Wald: β̂/SE → N(0,1) (unstable in small samples)
  • LR: −2(logL₀ − logL₁) → χ² (preferred)
  • Deviance: D = −2·logL; residuals analogous to RSS
  • Pseudo-R²: McFadden, Nagelkerke
⚠️

分離問題

  • 當某共變數能完美預測結果,β̂ → ±∞,SE → ∞
  • 常見於:稀有事件、子群分析、ML 共線
  • 解法:Firth penalization (logistf, brglm2),加 Jeffreys 先驗使 MLE 有限
  • 或:Bayesian logistic + 弱常規先驗
  • When a covariate perfectly predicts the outcome, β̂ → ±∞, SE → ∞
  • Common in: rare outcomes, subgroup analyses, multicollinear ML
  • Fix: Firth penalization (logistf, brglm2) — Jeffreys prior keeps MLE finite
  • Or: Bayesian logistic with weakly informative priors
🎯

判別 vs 校準

  • 判別(discrimination):能否排序——AUC / ROC
  • 校準(calibration):預測機率是否與實際頻率相符——calibration plot、Hosmer-Lemeshow、Brier score
  • 高 AUC + 差校準:能排序但不能當機率報告(風險溝通失效)
  • 類別不平衡:用 PR-AUC 取代 ROC-AUC
  • Discrimination: can it rank cases above non-cases — AUC / ROC
  • Calibration: do predicted probabilities match observed frequencies — calibration plot, Hosmer-Lemeshow, Brier
  • High AUC + poor calibration: ranks well but cannot be communicated as probability
  • Class imbalance: prefer PR-AUC over ROC-AUC
⚠️
不可崩塌性 (non-collapsibility):同一個 exposure 的 OR 會隨加入無關共變數而變大——這不是 confounding 修正,而是 logistic OR 的數學性質。意思是:你不能跨論文比較 OR;要嚴謹比較 effect size,請用 RR 或 risk difference。
Non-collapsibility: the OR for the same exposure changes (typically grows) when you add irrelevant covariates — this is a mathematical property of logistic OR, not confounding adjustment. Bottom line: do not compare ORs across papers with different covariate sets; for true effect-size comparison use RR or risk difference.

重塑 S 曲線

下面是「真實資料」(產生自固定的 β₀=−2, β₁=0.6)。拖動下方的擬合滑桿來改變模型的 β₀(截距)與 β₁(斜率),看 S 曲線如何隨之伸縮平移,同時顯示對應 OR、樣本 AUC、與平均 deviance。當你把滑桿放在 (−2, 0.6) 時擬合最好。

The dots are "true data" (generated from fixed β₀=−2, β₁=0.6). Move the fit sliders to change the model's β₀ (intercept) and β₁ (slope) and watch the S-curve stretch and shift, with live OR, sample AUC, and mean deviance. The best fit sits near (−2, 0.6).

灰點:觀測;藍線:擬合;橘虛線:真實曲線

二、常見應用

場景 模型 / 輸出 注意
病例-對照 GWAScase ~ SNP + PC1-5罕見變異 → FirthCase-control GWAScase ~ SNP + PC1-5Rare variants → Firth
預測響應的生物標記logistic + AUC / calibration交叉驗證避免過適Response biomarkerlogistic + AUC / calibrationCV to avoid overfit
細胞型別分類multinomial logit / softmax需 one-vs-rest 或 softmaxCell-type classificationmultinomial logit / softmaxOne-vs-rest or softmax
罕見變異 burden 檢定SKAT / Firth logistic小樣本下 Wald 失準Rare variant burdenSKAT / Firth logisticWald unstable at small n
臨床預測模型校準曲線 + DCA校準遠比 AUC 重要Clinical predictioncalibration + DCACalibration >> AUC

實作:擬合、OR、ROC、Firth

# --- R --- Logistic 全流程
library(broom); library(car); library(pROC); library(logistf)

# 1) 標準 logistic(病例-對照)
fit <- glm(case ~ exposure + age + sex,
            family = binomial(link = "logit"), data = df)

# 2) 整理結果並指數化為 OR
broom::tidy(fit, exponentiate = TRUE, conf.int = TRUE)
car::Anova(fit, type = "II")                  # LR 檢定

# 3) ROC / AUC
pred <- predict(fit, type = "response")
roc1 <- pROC::roc(df$case, pred)
roc1$auc; plot(roc1)

# 4) 校準圖(10 等分)
library(rms)
val.prob(pred, df$case)                       # calibration plot + stats

# 5) Firth 懲罰:解決分離與小樣本偏誤
fit_f <- logistf::logistf(case ~ variant + sex, data = df)
summary(fit_f)
# --- Python ---
import numpy as np
import statsmodels.formula.api as smf
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.calibration import calibration_curve

# 1) statsmodels:推論導向
fit = smf.logit("case ~ exposure + age + sex", data=df).fit()
print(fit.summary())

# 2) OR 與 CI
or_tab = np.exp(fit.params).to_frame("OR").join(np.exp(fit.conf_int()))
print(or_tab)

# 3) GLM 形式(等價)
sm.GLM(y, X, family=sm.families.Binomial(sm.families.links.logit())).fit()

# 4) sklearn:預測導向
clf = LogisticRegression(penalty="l2", C=1.0).fit(X, y)
proba = clf.predict_proba(X)[:,1]
roc_auc_score(y, proba)
prob_true, prob_pred = calibration_curve(y, proba, n_bins=10)
🚫
常見錯誤:① 把 OR 說成「風險高 X 倍」——只有 baseline p 很小才對。② 用同一筆資料訓練 + 計算 AUC——測得的是樂觀偏誤。應用 k-fold CV 或留外部測試集。③ AUC = 0.85 就大喊「臨床可用」——若校準很差,所謂的 70% 機率實際只有 30%,模型不能拿來決策。 Common mistakes: ① Reporting OR as "X times higher risk" — true only when baseline p is small. ② Training and evaluating AUC on the same data — optimistic bias. Use k-fold CV or held-out set. ③ Claiming AUC = 0.85 means "clinically useful" — if calibration is poor, "70% probability" might actually be 30%, useless for decisions.

📝 自我檢測

1. 何時 OR 是 RR 的近似?

1. When is OR a poor approximation to RR?

A. 永遠等於 RRA. Always equals RR
B. 當事件非常罕見B. When events are very rare
C. 當事件常見(baseline p > ~10%)時,OR 會誇大 RRC. When events are common (baseline p > ~10%), OR overstates RR
D. 只有在連續結果才有差別D. Only differs for continuous outcomes

2. 什麼是「分離」(separation) 問題?如何修正?

2. What is "separation" in logistic regression, and how is it fixed?

A. 訓練集與測試集沒分開——加 CVA. Train/test not split — add CV
B. 兩組變異不同——加變異穩健 SEB. Heteroscedasticity — use robust SE
C. 某共變數完美預測結果使 β̂ → ±∞——用 Firth 懲罰或 Bayesian 弱先驗C. A covariate perfectly predicts the outcome forcing β̂ → ±∞ — fix with Firth penalty or Bayesian weak prior
D. 多重共線性——刪除冗餘變數D. Multicollinearity — drop redundancy

3. 為什麼模型可以有高 AUC 卻校準很差?

3. Why can a model have high AUC yet poor calibration?

A. 一個 bug 沒抓到A. There is a bug
B. AUC 只衡量「能否把正類排在負類之前」(rank),不衡量機率是否與實際頻率一致B. AUC measures only ranking (can it order cases above non-cases), not whether predicted probabilities match observed frequencies
C. 校準總是與 AUC 等價C. Calibration is always equivalent to AUC
D. 不可能發生D. Impossible