從二元結果到一條機率曲線
當 Y 是二元結果(病/不病、活/死、有反應/無反應、30 天再入院/未再入院),直接套線性迴歸會產生不合理的負機率或大於 1 的機率,也違反同方差與常態假設。邏輯斯迴歸(logistic regression)用 logit 連結把機率轉到實數線:logit(p) = log(p/(1−p)) = β₀ + β₁X₁ + … + βₚXₚ。反過來,sigmoid(S 型)把線性預測值再映射回 [0, 1] 的機率區間。
它不只是「分類器」——更是機率模型。臨床預測模型(clinical prediction model)的主流,從 Framingham 心血管風險、APACHE II 重症評分、到 TRIPOD(Collins et al. 2015 BMJ;2024 升級為 TRIPOD+AI),骨幹都是 logistic / Cox。掌握 logistic 不只是學一個 GLM,是學會建構、驗證、報告一個可被臨床採用的風險模型。
When Y is a binary outcome (disease/healthy, alive/dead, responder/non-responder, 30-day readmission yes/no), linear regression will produce nonsensical probabilities below 0 or above 1 and violate homoscedasticity and normality. Logistic regression uses a logit link to push probabilities onto the real line: logit(p) = log(p/(1−p)) = β₀ + β₁X₁ + … + βₚXₚ. The inverse — the sigmoid S-curve — maps the linear predictor back into [0, 1].
It is more than a classifier — it is a probability model. The workhorse behind clinical prediction (Framingham, APACHE II, and the TRIPOD reporting guideline of Collins et al. 2015 BMJ, now TRIPOD+AI 2024) is logistic or Cox regression. Mastering logistic regression is mastering how to build, validate, and report a risk model that a clinician can actually use.
一、Logit、Sigmoid、OR、MLE
Logit 連結
logit(p) = log(p/(1−p)) = β₀ + β₁X
把 [0,1] 的機率拉到 (−∞, +∞);對 X 線性。
Odds = p/(1−p);機率 0.5 → odds 1 → logit 0。
logit(p) = log(p/(1−p)) = β₀ + β₁X.
Stretches [0,1] onto (−∞, +∞); linear in X.
Odds = p/(1−p); p = 0.5 → odds 1 → logit 0.
Sigmoid 反函數
p = 1/(1 + e−η),η = β₀ + β₁X。
50% 機率出現在 X = −β₀/β₁。
β₁ 越大 → 曲線越陡。
p = 1/(1 + e−η), η = β₀ + β₁X.
50% probability at X = −β₀/β₁.
Larger |β₁| → steeper curve.
Odds Ratio
X 增 1 單位 → odds 變 eβ₁ 倍。
注意:是 odds × eβ,不是 probability × eβ。
eβ₁ = 1 → 無關;> 1 → 風險升高。
+1 unit in X → odds multiplied by eβ₁.
Beware: it's odds × eβ, not probability × eβ.
eβ₁ = 1 → no effect; > 1 → higher odds.
估計:最大概似法(MLE)。沒有閉式解,標準做法是 IRLS(Iteratively Reweighted Least Squares,疊代加權最小平方)——每一輪用上一輪的 p̂ 計算工作反應變數和權重,做加權迴歸,更新 β,直到收斂。R 的 glm() 與 Python 的 statsmodels 內部都跑 IRLS。
觀察 n 個資料 (Xᵢ, Yᵢ),對數概似為 ℓ(β) = Σ [Yᵢ log pᵢ + (1−Yᵢ) log(1−pᵢ)]。最大化 ℓ 等於最小化「log-loss / 交叉熵」——這正是機器學習裡 sigmoid 神經元的損失函數,logistic 是最簡單的單層神經網路。
Estimation: maximum likelihood (MLE). There is no closed-form solution; the standard procedure is IRLS (Iteratively Reweighted Least Squares) — each iteration uses the current p̂ to build a working response and weights, fits a weighted least-squares regression, and updates β until convergence. Both R's glm() and Python's statsmodels run IRLS under the hood.
For n observations (Xᵢ, Yᵢ) the log-likelihood is ℓ(β) = Σ [Yᵢ log pᵢ + (1−Yᵢ) log(1−pᵢ)]. Maximising ℓ equals minimising log-loss / cross-entropy — the very loss used by a sigmoid neuron in ML. Logistic regression is the simplest one-layer neural network.
Sigmoid 遊樂場
調整 β₀(截距,移動曲線左右)與 β₁(斜率,控制陡度)。30 個模擬樣本(藍/橘 = 0/1)會跟著機率曲線重新生成。觀察重點:(1) β₁ > 0 時曲線向右上升,β₁ < 0 反向;(2) 機率 50% 落在 X = −β₀/β₁;(3) OR = eβ₁;(4) β₁ → 0 時曲線變平坦——X 幾乎不影響 Y。
Adjust β₀ (intercept — shifts left/right) and β₁ (slope — controls steepness). The 30 simulated points (blue = 0, orange = 1) re-draw from the current probability curve. Notice: (1) β₁ > 0 rises right, β₁ < 0 falls; (2) p = 0.5 at X = −β₀/β₁; (3) OR = eβ₁; (4) as β₁ → 0 the curve flattens — X carries little information.
綠線 = sigmoid 機率曲線 · 灰虛線 = 50% 閾值 · 點 = 30 個模擬樣本(藍=0, 橘=1)Green = sigmoid · grey dashed = 0.5 threshold · dots = 30 simulated samples (blue = 0, orange = 1)
二、Wald / LRT / Score
對單一係數 βj 檢定「H₀: βj = 0」有三種等價的漸近方法,但在有限樣本下表現差異很大。
To test "H₀: βj = 0" we have three asymptotically equivalent tests, but their finite-sample behaviour can differ a lot.
| 檢定 | 統計量 | 優點 | 風險 | |||
|---|---|---|---|---|---|---|
| Wald | z = β̂ / SE(β̂) | 最快、軟體預設輸出 | 小樣本或 |β| 大時不可靠(Hauck-Donner 1977);可能反而拒絕變弱 | Fastest, default software output | Unreliable for small n or large |β| (Hauck-Donner 1977); power paradoxically drops | |
| Likelihood-Ratio (LRT) | 2(ℓ₁ − ℓ₀) ~ χ²(df) | 統計學界公認最可靠;對巢狀模型比較通用 | 需 refit 兩個模型;多了一次計算 | Most reliable; works for nested model comparison | Needs two fits — slight computational cost | |
| Score (Rao) | 在 H₀ 下的梯度 | 只需擬合限制模型;理論上接近 LRT | 實務罕用;某些分析(如 McNemar)等同 | Gradient evaluated under H₀ | Needs only the restricted fit; close to LRT in theory | Rarely used in practice; equivalent to McNemar in 2×2 designs |
anova(fit, test="Chisq");Python: statsmodels 的 LR-test)。報告個別係數 → 用 profile-likelihood 95% CI(R: confint(fit),預設即 profile)比 Wald CI 更可靠,尤其當 |β| 大、樣本小、或趨近完美分離。
Practical rule: for overall model fit, use the LRT (R: anova(fit, test="Chisq"); Python: statsmodels LR-test). For individual coefficients, prefer the profile-likelihood 95% CI (R's confint(fit) defaults to profile) over Wald CIs — much safer when |β| is large, n is small, or separation is approached.
Pseudo-R² 警告:三種常見指標 McFadden(1 − ℓ₁/ℓ₀)、Cox-Snell(有上限<1)、Nagelkerke(縮放到 [0,1])都不是線性 R² 的對等物。McFadden 0.2–0.4 已算「優秀」。報告時請明確標出哪一種,並不要把 0.3 跟線性 R² 的 0.3 比較。
Pseudo-R² caveats: the three common variants — McFadden (1 − ℓ₁/ℓ₀), Cox-Snell (capped < 1), and Nagelkerke (rescaled to [0,1]) — are NOT direct analogues of linear R². McFadden values of 0.2–0.4 already count as "excellent". Always state which variant, and do not compare a pseudo-R² of 0.3 to a linear-regression R² of 0.3.
三、GOF / Discrimination / Calibration / DCA
評估邏輯斯模型有四層,許多論文只報第二層就停下——這是 Steyerberg 2019《Clinical Prediction Models》與 TRIPOD 2015 一再批評的疏失。
A logistic model should be evaluated on four layers. Many papers stop at the second one — exactly the gap criticised by Steyerberg's Clinical Prediction Models (2019) and TRIPOD (2015).
① 適合度 GOF
Deviance:D = −2 ℓ。比較巢狀模型直接用 ΔD ~ χ². Hosmer-Lemeshow GOF:把 p̂ 切 10 等分,比較預期 vs 觀察事件數。
警告(Hosmer-Lemeshow-Sturdivant 2013 3rd ed):HL 檢定對分組數 g 很敏感(g 從 8 變 12 結論可能反轉),且大 n 時容易顯著、小 n 時又力不足。現代建議:看 calibration plot,而不是只盯 HL p 值。
Deviance: D = −2 ℓ. Compare nested models with ΔD ~ χ². Hosmer-Lemeshow GOF: bin p̂ into 10 deciles and compare expected vs observed counts.
Warning (Hosmer-Lemeshow-Sturdivant 2013, 3rd ed): the HL test is sensitive to the bin count g (changing g from 8 to 12 can flip the verdict), tends to be over-significant at large n and under-powered at small n. Modern advice: look at the calibration plot, not the HL p-value alone.
② 區辨力(ROC / AUC)
ROC 曲線:在所有閾值下繪 TPR (sensitivity) vs FPR (1 − specificity)。AUC = 隨機抽一正一負,模型給正樣本較高分的機率。0.5 = 亂猜;0.7–0.8 普通;0.8–0.9 好;> 0.9 罕見、需檢查 overfit 或資訊洩漏。
陷阱:類別不平衡(如 1:99 罕病)時 ROC/AUC 可能虛胖——Saito & Rehmsmeier 2015 PLOS ONE 建議改用 Precision-Recall curve / AUPRC。
ROC curve: TPR (sensitivity) vs FPR (1 − specificity) across thresholds. AUC = the probability the model scores a random positive higher than a random negative. 0.5 = chance; 0.7–0.8 fair; 0.8–0.9 good; > 0.9 rare — check for overfitting or data leakage.
Pitfall: with severe class imbalance (e.g. 1:99 rare disease), ROC/AUC can be misleadingly optimistic — Saito & Rehmsmeier 2015 PLOS ONE recommend the Precision-Recall curve / AUPRC instead.
③ 校準度 Calibration
Calibration plot:x = 預測機率 p̂(分十等分或用 loess 平滑),y = 該段的觀察事件率。完美時落在 y = x 對角線。
三層 (Van Calster 2019 BMC Med): 平均校準(calibration-in-the-large)、弱校準(intercept + slope)、強校準(每個 p̂ 都對齊)。
Brier score = mean(p̂i − Yi)²;越小越好;含 calibration + discrimination;可分解為 reliability − resolution + uncertainty。
Calibration plot: x = predicted probability p̂ (deciles or loess smooth), y = observed event rate. Perfect lies on the y = x diagonal.
Three layers (Van Calster 2019 BMC Med): mean calibration (calibration-in-the-large), weak (intercept + slope), strong (every p̂ aligned).
Brier score = mean(p̂i − Yi)²; smaller is better; combines calibration + discrimination; decomposable into reliability − resolution + uncertainty.
④ 臨床效益(DCA)
Decision Curve Analysis(Vickers & Elkin 2006 Med Decis Making):在各閾值 pt 計算 net benefit = (TP/n) − (FP/n) × (pt / (1 − pt))。同圖比較「模型 vs 全治療 vs 全不治療」三條線——模型只有在「贏過兩個極端策略」的閾值範圍內才有臨床價值。
實作:R 套件 dcurves(Vickers 維護);Python 套件 dcurves。
Decision Curve Analysis (Vickers & Elkin 2006 Med Decis Making): at each threshold pt compute net benefit = (TP/n) − (FP/n) × (pt / (1 − pt)). Plot the model against "treat all" and "treat none" — the model has clinical value only over the threshold range where it beats both extremes.
Implementation: R package dcurves (maintained by Vickers); Python dcurves.
四、EPV:從 1996 到 2020
EPV ≥ 10
Peduzzi et al. 1996 J Clin Epidemiol(DOI:10.1016/S0895-4356(96)00236-3)以模擬證明:每個 predictor 至少需要 10 個 events(注意:是「事件數」即少數類別的個數,不是總 n)。EPV < 10 時,β̂ 嚴重偏誤、CI 涵蓋率錯誤、變數選擇不穩定。例:若預期事件率 20%,目標 n = 200,則最多可放 200×0.2 / 10 = 4 個變數。
Peduzzi et al. 1996 J Clin Epidemiol (DOI:10.1016/S0895-4356(96)00236-3) showed via simulation that each predictor needs at least 10 events (the count of the minority class, not the total n). Below EPV = 10, β̂ is biased, CI coverage is wrong, and variable selection becomes unstable. Example: expected event rate 20%, target n = 200 → at most 200×0.2 / 10 = 4 predictors.
Riley 2020 BMJ
Riley et al. 2020 BMJ「Calculating the sample size required for developing a clinical prediction model」指出 EPV ≥ 10 是粗糙的經驗法則。新方法基於三個目標:(1) 低 optimism (overfitting);(2) Cox-Snell R²app − R²true 小於 0.05;(3) calibration slope 接近 1。R 套件 pmsampsize 是官方實作;Python 對應為 pmsampsize-py。
典型結果:當預測變數多、預期 R² 低、結局少見時,新公式需要的 n 往往遠大於 EPV=10 給的。例如 10 個變數、event rate 10%、預期 Cox-Snell R²=0.05 → 至少需 n ≈ 900(EPV 規則只說 n=100)。
Riley et al. 2020 BMJ ("Calculating the sample size required for developing a clinical prediction model") shows EPV ≥ 10 is a coarse rule of thumb. The newer derivation targets three criteria: (1) low optimism (overfitting); (2) Cox-Snell R²app − R²true < 0.05; (3) calibration slope close to 1. R package pmsampsize is the canonical implementation; Python has pmsampsize-py.
Typical outcome: with many predictors, low expected R², and a rare outcome, the modern formula usually demands far more than EPV=10. E.g. 10 predictors, 10% event rate, expected Cox-Snell R² = 0.05 → n ≈ 900 (EPV rule would say only n = 100).
ROC 曲線實驗
調整兩組(正/負)的「分離度」(類似 Cohen's d)。1000 個模擬樣本在每個閾值下計算 TPR / FPR,畫出 ROC 曲線並計算 AUC。觀察:(1) 分離度 = 0 時 AUC ≈ 0.5(亂猜);(2) 分離度 = 3 時 AUC > 0.95;(3) Youden 最佳閾值(紅點)通常不是 0.5,會隨類別重疊改變。
Adjust the separation (Cohen's d-like) between the positive and negative groups. 1000 simulated points produce TPR / FPR at every threshold to draw the ROC curve and compute AUC. Watch: (1) separation = 0 → AUC ≈ 0.5 (chance); (2) separation = 3 → AUC > 0.95; (3) Youden's optimal cut (red dot) is rarely 0.5 and shifts with overlap.
綠線 = ROC · 灰虛線 = 亂猜對角 · 紅點 = Youden's J 最佳閾值Green = ROC · grey dashed = chance · red dot = Youden's J optimum
五、Logistic 還是別的家族?
🌳 結果型別決策樹
MASS::polr;先檢查 PO 假設。nnet::multinom。geepack)或 GLMM(個體效應,lme4::glmer)。見 Step 13。MASS::polr; check the PO assumption first.nnet::multinom).geepack) or GLMM (subject-specific; lme4::glmer). See Step 13.六、Perfect Separation & Firth
完美分離(complete separation):某個 X 可以 100% 區分 Y = 0 / 1 → 對應 β → ±∞,MLE 不存在。軟體會給「algorithm did not converge」或極大的 SE(如 SE = 10⁹)。準完美分離(quasi-complete):分離只在部分資料發生。常見於罕見事件、小樣本、稀有 category。
解法:Firth's 罰估計(Firth 1993 Biometrika)。對概似乘上 Jeffreys prior 的開根號項,等價於對 score 加上偏誤修正項。可得到有限的 β̂ 與合理的 CI。R 套件 logistf(Heinze & Schemper 2002 Stat Med);Python 有 firthlogist。
近年也常見 L2 (Ridge) 或 L1 (Lasso) 罰估計(glmnet、sklearn LogisticRegression(penalty='l1'/'l2')),用於高維情境(變數 > n),但會犧牲係數的因果可解釋性,預測為主時才適用。
Complete separation: some X perfectly splits Y = 0 / 1 → the corresponding β → ±∞ and MLE does not exist. Software will throw "algorithm did not converge" or huge SE (e.g. 10⁹). Quasi-complete separation occurs partially. Common with rare events, small samples, or sparse categories.
Fix: Firth's penalised likelihood (Firth 1993 Biometrika). Multiplies the likelihood by the square root of the Jeffreys prior, equivalently adds a score-function bias correction. Yields finite β̂ and well-behaved CIs. R: logistf (Heinze & Schemper 2002 Stat Med); Python: firthlogist.
Modern alternatives include L2 (ridge) or L1 (lasso) penalties (glmnet, sklearn LogisticRegression(penalty='l1'/'l2')) for high-dimensional settings (p > n), but they sacrifice causal interpretability of coefficients — fine when prediction is the only goal.
glm 警告 "fitted probabilities 0 or 1" → 強烈警示;(3) cross-tab 看每個 binary X 對 Y 的 2×2 表,若有 0 cell → 高度可疑;(4) 解法優先序:合併稀有 category > Firth > 移除該變數。
Diagnostic checklist: (1) any |β̂| > 10 or SE > 10 → suspect separation; (2) glm warns "fitted probabilities 0 or 1" → strong signal; (3) cross-tab each binary X against Y — any zero cell is a red flag; (4) priority order: collapse sparse categories > Firth > drop the variable.
七、實作範例
# R: glm + diagnostics + ROC + calibration + Brier + DCA library(tidyverse); library(pROC); library(rms); library(dcurves); library(logistf); library(pmsampsize) # --- Fit --- fit <- glm(disease ~ age + bmi + smoke, data = df, family = binomial(link = "logit")) summary(fit) exp(cbind(OR = coef(fit), confint(fit))) # profile-likelihood CI # --- LRT vs Wald --- anova(fit, test = "Chisq") # LRT (preferred) fit0 <- update(fit, . ~ . - smoke) anova(fit0, fit, test = "LRT") # nested LRT # --- Discrimination: ROC + AUC --- p <- predict(fit, type = "response") roc1 <- pROC::roc(df$disease, p, ci = TRUE) roc1$auc; pROC::ci.auc(roc1) pROC::coords(roc1, "best", ret = c("threshold", "sens", "spec"), best.method = "youden") # --- Calibration plot + Brier (rms package) --- ddist <- rms::datadist(df); options(datadist = "ddist") fit_lrm <- rms::lrm(disease ~ age + bmi + smoke, data = df, x = TRUE, y = TRUE) rms::calibrate(fit_lrm, method = "boot", B = 200) |> plot() mean((p - df$disease)^2) # Brier score # --- Hosmer-Lemeshow GOF (use with caution) --- ResourceSelection::hoslem.test(df$disease, p, g = 10) # --- Decision Curve Analysis (Vickers 2006) --- dcurves::dca(disease ~ p, data = df, thresholds = seq(0.05, 0.5, 0.01)) |> plot() # --- Perfect separation → Firth --- fit_firth <- logistf::logistf(disease ~ rare_var + age, data = df) summary(fit_firth) # --- Sample size: Riley 2020 --- pmsampsize::pmsampsize(type = "b", csrsquared = 0.05, parameters = 10, prevalence = 0.10)
import numpy as np, pandas as pd import statsmodels.api as sm from statsmodels.formula.api import logit from sklearn.metrics import roc_auc_score, roc_curve, brier_score_loss from sklearn.calibration import calibration_curve import matplotlib.pyplot as plt # --- Fit --- fit = logit("disease ~ age + bmi + smoke", df).fit(disp=0) print(fit.summary()) OR = np.exp(fit.params) CI = np.exp(fit.conf_int()) # Wald CI # --- LRT (preferred over Wald for inference) --- fit0 = logit("disease ~ age + bmi", df).fit(disp=0) LR = 2*(fit.llf - fit0.llf) from scipy.stats import chi2 p_LR = 1 - chi2.cdf(LR, df=1) # --- ROC + AUC + Youden --- p = fit.predict(df) auc = roc_auc_score(df.disease, p) fpr, tpr, thr = roc_curve(df.disease, p) j = tpr - fpr; t_opt = thr[j.argmax()] # --- Calibration plot + Brier --- prob_pred, prob_obs = calibration_curve(df.disease, p, n_bins=10) brier = brier_score_loss(df.disease, p) plt.plot(prob_pred, prob_obs, marker="o"); plt.plot([0,1], [0,1], "k--") # --- Decision Curve Analysis (dcurves package) --- # pip install dcurves from dcurves import dca, plot_graphs res = dca(data=df, outcome="disease", modelnames=["p"], thresholds=np.arange(0.05, 0.5, 0.01)) plot_graphs(res) # --- Firth penalised likelihood (separation) --- # pip install firthlogist from firthlogist import FirthLogisticRegression flr = FirthLogisticRegression().fit(X, y) flr.coef_, flr.ci_ # finite β̂ + CIs
pmsampsize 算 n;(2) glm 擬合,profile CI;(3) LRT 報告整體 / 子組顯著性;(4) ROC + AUC + Youden 區辨;(5) calibration plot + Brier 校準;(6) DCA 評估臨床效益;(7) bootstrap 內部驗證(200–500 次)或外部驗證集;(8) 依 TRIPOD 2015 / TRIPOD+AI 2024 撰寫報告。
Workflow: (1) pmsampsize for n; (2) glm + profile CIs; (3) LRT for overall/sub-effects; (4) ROC + AUC + Youden for discrimination; (5) calibration plot + Brier; (6) DCA for clinical utility; (7) bootstrap internal validation (200–500 reps) or an external cohort; (8) write the report against TRIPOD 2015 / TRIPOD+AI 2024.
八、六個經典錯誤
❌ OR 當 RR 解讀
「吸菸 OR=2 → 風險變兩倍」是常見誤讀。當基準率高時 OR 會誇大 RR;當基準率 < 5% 時 OR ≈ RR 才成立。常見結局請用 log-binomial 或 Poisson with robust SE(Zou 2004)。
"OR=2 for smoking means double the risk" is a common misread. With a common outcome OR overstates RR; only when prevalence < 5% is OR ≈ RR. For common outcomes use log-binomial or Poisson with robust SE (Zou 2004).
❌ 小 EPV → overfit
事件數少卻塞入十幾個變數是「醫學期刊 reviewer 看一眼就拒」的紅旗。先用 pmsampsize 算 n,事件多的最好;變數多時請改用 Lasso 或 penalised logistic 並做 bootstrap 內部驗證。
Stuffing a dozen predictors into a model with few events is a reviewer's instant red flag. Compute n with pmsampsize; with many variables use Lasso or penalised logistic plus bootstrap internal validation.
❌ 完美分離未處理
輸出顯示 OR = 4.7 × 10⁸、SE = 9.9 × 10⁹,仍照常報告——這個結果無意義。請改用 Firth (logistf) 或合併稀有 category。Heinze & Schemper 2002 是經典實作參考。
Reporting OR = 4.7 × 10⁸ with SE = 9.9 × 10⁹ as if it were real — that result is meaningless. Use Firth (logistf) or collapse sparse categories. Heinze & Schemper 2002 Stat Med is the canonical reference.
❌ 預設閾值 0.5
在不平衡或 false-positive / false-negative 成本不對稱時,0.5 幾乎肯定不是最佳。請用 Youden's J、Net Benefit、或最大化「clinical utility」的閾值;報告整條 ROC + DCA,而不是單一混淆矩陣。
With class imbalance or asymmetric costs of FP/FN, 0.5 is almost surely suboptimal. Use Youden's J, net benefit, or the threshold maximising "clinical utility"; report the full ROC + DCA, not a single confusion matrix.
❌ 只報 AUC
AUC 高的模型可能 systematically 高估或低估機率(calibration 差),臨床上仍會給錯藥。Van Calster 2019 BMC Med:必報 calibration plot + intercept + slope + Brier。
A model with high AUC can still systematically over- or under-estimate probabilities (poor calibration), leading clinicians astray. Van Calster 2019 BMC Med: always report calibration plot + intercept + slope + Brier.
❌ 內部 ≠ 外部驗證
train/test split 與 cross-validation 是內部驗證(同分布)。模型搬到不同醫院、不同年代、不同人口要做外部驗證,TRIPOD 2015 / TRIPOD+AI 2024 強調分開報告 development、internal validation、external validation 三階段。
Train/test splits and cross-validation are internal validation (same distribution). Transporting the model to a different hospital, era, or population requires external validation. TRIPOD 2015 / TRIPOD+AI 2024 require separate reporting of development, internal validation, and external validation.
九、必讀文獻
- Peduzzi P et al. 1996 J Clin Epidemiol — EPV ≥ 10 原始模擬。DOI:10.1016/S0895-4356(96)00236-3
- Riley RD et al. 2020 BMJ — 現代樣本數規則,
pmsampsize對應。DOI:10.1136/bmj.m441 - Firth D 1993 Biometrika — 罰估計法,解決完美分離。DOI:10.1093/biomet/80.1.27
- Hosmer DW, Lemeshow S, Sturdivant RX 2013 Applied Logistic Regression 3rd ed, Wiley.
- Steyerberg EW 2019 Clinical Prediction Models 2nd ed, Springer — 預測模型 SOP。
- Collins GS, Reitsma JB, Altman DG, Moons KGM 2015 BMJ — TRIPOD reporting guideline。DOI:10.1136/bmj.g7594
- Collins GS et al. 2024 BMJ — TRIPOD+AI 更新版。DOI:10.1136/bmj-2023-078378
- Vickers AJ, Elkin EB 2006 Med Decis Making — Decision Curve Analysis。DOI:10.1177/0272989X06295361
- Saito T, Rehmsmeier M 2015 PLOS ONE — 類別不平衡時 PR 優於 ROC。DOI:10.1371/journal.pone.0118432
- Van Calster B et al. 2019 BMC Med — Calibration「未完成」的呼籲。DOI:10.1186/s12916-019-1466-7
- Zou G 2004 Am J Epidemiol — Modified Poisson for RR。DOI:10.1093/aje/kwh090
- Heinze G, Schemper M 2002 Stat Med — Firth 在邏輯斯迴歸中的實作 (
logistf)。DOI:10.1002/sim.1047
- Peduzzi P et al. 1996 J Clin Epidemiol — the original EPV ≥ 10 simulation. DOI:10.1016/S0895-4356(96)00236-3
- Riley RD et al. 2020 BMJ — modern sample-size formulae behind
pmsampsize. DOI:10.1136/bmj.m441 - Firth D 1993 Biometrika — penalised likelihood for separation. DOI:10.1093/biomet/80.1.27
- Hosmer DW, Lemeshow S, Sturdivant RX 2013 Applied Logistic Regression 3rd ed, Wiley.
- Steyerberg EW 2019 Clinical Prediction Models 2nd ed, Springer — the prediction-modelling SOP.
- Collins GS, Reitsma JB, Altman DG, Moons KGM 2015 BMJ — the TRIPOD guideline. DOI:10.1136/bmj.g7594
- Collins GS et al. 2024 BMJ — TRIPOD+AI update. DOI:10.1136/bmj-2023-078378
- Vickers AJ, Elkin EB 2006 Med Decis Making — Decision Curve Analysis. DOI:10.1177/0272989X06295361
- Saito T, Rehmsmeier M 2015 PLOS ONE — PR over ROC under class imbalance. DOI:10.1371/journal.pone.0118432
- Van Calster B et al. 2019 BMC Med — the "calibration is not done" call. DOI:10.1186/s12916-019-1466-7
- Zou G 2004 Am J Epidemiol — modified Poisson for RR. DOI:10.1093/aje/kwh090
- Heinze G, Schemper M 2002 Stat Med — Firth implementation in logistic regression (
logistf). DOI:10.1002/sim.1047
📝 自我檢測
1. 你訓練了一個 30 天死亡率的 logistic 模型,總 n = 1200,但事件僅 60,候選變數 12。下列何者最合適?
1. You build a logistic model for 30-day mortality, total n = 1200, events = 60, candidate predictors = 12. Best step?
2. 一篇論文報告 AUC = 0.88 但沒有 calibration plot。下列何者是最強的疑慮?
2. A paper reports AUC = 0.88 but no calibration plot. Strongest concern?
3. glm() 警告 "fitted probabilities numerically 0 or 1 occurred",且某係數 SE = 1.2 × 10⁹。最佳處理?
3. glm() warns "fitted probabilities numerically 0 or 1 occurred" and one SE = 1.2 × 10⁹. Best fix?
4. 30 天再入院率約 25%(常見結局)。研究者報告「OR=2.5 表示再入院風險變 2.5 倍」。下列何者最正確?
4. 30-day readmission rate ≈ 25% (common outcome). The authors say "OR=2.5 means 2.5× the risk." Most correct?