Step 8: Logistic Regression & Odds Ratios — Statistical Inference Tutorial

概覽

為什麼要 logit link？

二元結果（patient / control、突變 / 野生型、響應 / 無響應）若直接 OLS 擬合，預測值可能 < 0 或 > 1——既不合理也違反同質變異。Logistic 迴歸用 logit link 把 (0, 1) 機率對映到 (−∞, ∞)：

logit(p) = log(p/(1−p)) = β₀ + β₁x₁ + ... + β_kx_k

係數 β 是對數勝算 (log-odds)，exp(β) 是勝算比 (odds ratio, OR)。MLE 透過 IRLS（迭代加權最小平方）求解；推論可用 Wald、Score、Likelihood Ratio 三種檢定。

Binary outcomes (case/control, mutant/WT, responder/non) cannot be sensibly fit by OLS — predicted values escape (0, 1) and homoscedasticity fails. Logistic regression uses the logit link to map probabilities in (0, 1) to the real line:

logit(p) = log(p/(1−p)) = β₀ + β₁x₁ + ... + β_kx_k

The β coefficients are log-odds; exp(β) is the odds ratio (OR). MLE proceeds by IRLS (iteratively reweighted least squares). Inference comes from Wald, Score, or Likelihood Ratio tests.

💡

核心警告：OR ≠ RR。勝算比與相對風險只有在「事件罕見」（baseline p < ~10%）時才近似相等。在常見事件（response 率 40%）OR 會嚴重誇大真正的 RR。要報告 RR，就用 log-binomial 或 Poisson + robust SE，不要硬把 OR 當 RR 講。 Core warning: OR ≠ RR. Odds ratio approximates relative risk only when events are rare (baseline p < ~10%). For common outcomes (response 40%), OR overstates the true RR. To report RR, use log-binomial or Poisson with robust SE — never paraphrase OR as RR.

核心概念

一、模型構成

📈

係數與 OR

β₁ = x₁ 每增加 1 單位的對數勝算變化
OR = exp(β₁)；OR > 1 → 風險升高；OR = 1 → 無關
95% CI：exp(β̂ ± 1.96·SE)——在對數尺度對稱
類別變數：每 level 相對於 reference 的 OR

β₁ = change in log-odds per unit of x₁
OR = exp(β₁); OR > 1 → higher risk; OR = 1 → null
95% CI: exp(β̂ ± 1.96·SE) — symmetric on the log scale
Categorical: OR of each level vs reference

🧮

估計與檢定

MLE via Newton-Raphson / IRLS
Wald：β̂/SE → N(0,1)（小樣本不穩）
LR：−2(logL₀ − logL₁) → χ²（首選）
Deviance：D = −2·logL；殘差類似 RSS
Pseudo-R²：McFadden、Nagelkerke

MLE via Newton-Raphson / IRLS
Wald: β̂/SE → N(0,1) (unstable in small samples)
LR: −2(logL₀ − logL₁) → χ² (preferred)
Deviance: D = −2·logL; residuals analogous to RSS
Pseudo-R²: McFadden, Nagelkerke

⚠️

分離問題

當某共變數能完美預測結果，β̂ → ±∞，SE → ∞
常見於：稀有事件、子群分析、ML 共線
解法：Firth penalization (logistf, brglm2)，加 Jeffreys 先驗使 MLE 有限
或：Bayesian logistic + 弱常規先驗

When a covariate perfectly predicts the outcome, β̂ → ±∞, SE → ∞
Common in: rare outcomes, subgroup analyses, multicollinear ML
Fix: Firth penalization (logistf, brglm2) — Jeffreys prior keeps MLE finite
Or: Bayesian logistic with weakly informative priors

🎯

判別 vs 校準

判別（discrimination）：能否排序——AUC / ROC
校準（calibration）：預測機率是否與實際頻率相符——calibration plot、Hosmer-Lemeshow、Brier score
高 AUC + 差校準：能排序但不能當機率報告（風險溝通失效）
類別不平衡：用 PR-AUC 取代 ROC-AUC

Discrimination: can it rank cases above non-cases — AUC / ROC
Calibration: do predicted probabilities match observed frequencies — calibration plot, Hosmer-Lemeshow, Brier
High AUC + poor calibration: ranks well but cannot be communicated as probability
Class imbalance: prefer PR-AUC over ROC-AUC

⚠️

不可崩塌性 (non-collapsibility)：同一個 exposure 的 OR 會隨加入無關共變數而變大——這不是 confounding 修正，而是 logistic OR 的數學性質。意思是：你不能跨論文比較 OR；要嚴謹比較 effect size，請用 RR 或 risk difference。

Non-collapsibility: the OR for the same exposure changes (typically grows) when you add irrelevant covariates — this is a mathematical property of logistic OR, not confounding adjustment. Bottom line: do not compare ORs across papers with different covariate sets; for true effect-size comparison use RR or risk difference.

互動模擬

重塑 S 曲線

下面是「真實資料」（產生自固定的 β₀=−2, β₁=0.6）。拖動下方的擬合滑桿來改變模型的 β₀（截距）與 β₁（斜率），看 S 曲線如何隨之伸縮平移，同時顯示對應 OR、樣本 AUC、與平均 deviance。當你把滑桿放在 (−2, 0.6) 時擬合最好。

The dots are "true data" (generated from fixed β₀=−2, β₁=0.6). Move the fit sliders to change the model's β₀ (intercept) and β₁ (slope) and watch the S-curve stretch and shift, with live OR, sample AUC, and mean deviance. The best fit sits near (−2, 0.6).

— — —

β₀ -2.0

β₁ 0.60

灰點：觀測；藍線：擬合；橘虛線：真實曲線

生物資訊應用

二、常見應用

場景	模型 / 輸出	注意
病例-對照 GWAS	`case ~ SNP + PC1-5`	罕見變異 → Firth	Case-control GWAS	`case ~ SNP + PC1-5`	Rare variants → Firth
預測響應的生物標記	logistic + AUC / calibration	交叉驗證避免過適	Response biomarker	logistic + AUC / calibration	CV to avoid overfit
細胞型別分類	multinomial logit / softmax	需 one-vs-rest 或 softmax	Cell-type classification	multinomial logit / softmax	One-vs-rest or softmax
罕見變異 burden 檢定	SKAT / Firth logistic	小樣本下 Wald 失準	Rare variant burden	SKAT / Firth logistic	Wald unstable at small n
臨床預測模型	校準曲線 + DCA	校準遠比 AUC 重要	Clinical prediction	calibration + DCA	Calibration >> AUC

程式碼

實作：擬合、OR、ROC、Firth

# --- R --- Logistic 全流程
library(broom); library(car); library(pROC); library(logistf)

# 1) 標準 logistic（病例-對照）
fit <- glm(case ~ exposure + age + sex,
            family = binomial(link = "logit"), data = df)

# 2) 整理結果並指數化為 OR
broom::tidy(fit, exponentiate = TRUE, conf.int = TRUE)
car::Anova(fit, type = "II")                  # LR 檢定

# 3) ROC / AUC
pred <- predict(fit, type = "response")
roc1 <- pROC::roc(df$case, pred)
roc1$auc; plot(roc1)

# 4) 校準圖（10 等分）
library(rms)
val.prob(pred, df$case)                       # calibration plot + stats

# 5) Firth 懲罰：解決分離與小樣本偏誤
fit_f <- logistf::logistf(case ~ variant + sex, data = df)
summary(fit_f)

# --- Python ---
import numpy as np
import statsmodels.formula.api as smf
import statsmodels.api as sm
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
from sklearn.calibration import calibration_curve

# 1) statsmodels：推論導向
fit = smf.logit("case ~ exposure + age + sex", data=df).fit()
print(fit.summary())

# 2) OR 與 CI
or_tab = np.exp(fit.params).to_frame("OR").join(np.exp(fit.conf_int()))
print(or_tab)

# 3) GLM 形式（等價）
sm.GLM(y, X, family=sm.families.Binomial(sm.families.links.logit())).fit()

# 4) sklearn：預測導向
clf = LogisticRegression(penalty="l2", C=1.0).fit(X, y)
proba = clf.predict_proba(X)[:,1]
roc_auc_score(y, proba)
prob_true, prob_pred = calibration_curve(y, proba, n_bins=10)

🚫

常見錯誤：① 把 OR 說成「風險高 X 倍」——只有 baseline p 很小才對。② 用同一筆資料訓練 + 計算 AUC——測得的是樂觀偏誤。應用 k-fold CV 或留外部測試集。③ AUC = 0.85 就大喊「臨床可用」——若校準很差，所謂的 70% 機率實際只有 30%，模型不能拿來決策。 Common mistakes: ① Reporting OR as "X times higher risk" — true only when baseline p is small. ② Training and evaluating AUC on the same data — optimistic bias. Use k-fold CV or held-out set. ③ Claiming AUC = 0.85 means "clinically useful" — if calibration is poor, "70% probability" might actually be 30%, useless for decisions.

📝 自我檢測

1. 何時 OR 是 RR 的差近似？

1. When is OR a poor approximation to RR?

A. 永遠等於 RRA. Always equals RR

B. 當事件非常罕見B. When events are very rare

C. 當事件常見（baseline p > ~10%）時，OR 會誇大 RRC. When events are common (baseline p > ~10%), OR overstates RR

D. 只有在連續結果才有差別D. Only differs for continuous outcomes

2. 什麼是「分離」(separation) 問題？如何修正？

2. What is "separation" in logistic regression, and how is it fixed?

A. 訓練集與測試集沒分開——加 CVA. Train/test not split — add CV

B. 兩組變異不同——加變異穩健 SEB. Heteroscedasticity — use robust SE

C. 某共變數完美預測結果使 β̂ → ±∞——用 Firth 懲罰或 Bayesian 弱先驗C. A covariate perfectly predicts the outcome forcing β̂ → ±∞ — fix with Firth penalty or Bayesian weak prior

D. 多重共線性——刪除冗餘變數D. Multicollinearity — drop redundancy

3. 為什麼模型可以有高 AUC 卻校準很差？

3. Why can a model have high AUC yet poor calibration?

A. 一個 bug 沒抓到A. There is a bug

B. AUC 只衡量「能否把正類排在負類之前」（rank），不衡量機率是否與實際頻率一致B. AUC measures only ranking (can it order cases above non-cases), not whether predicted probabilities match observed frequencies

C. 校準總是與 AUC 等價C. Calibration is always equivalent to AUC

D. 不可能發生D. Impossible