為什麼存活分析 ≠ 二元結果?
臨床試驗最常見的錯誤之一:把「兩年內復發 / 未復發」當作二元結果跑 logistic 迴歸——這會浪費大量資訊。存活分析的關鍵在於「時間」:兩個都復發的病人,一個在第 3 個月、另一個在第 23 個月,臨床意義完全不同。更重要的是,研究結束時尚未復發的病人,我們只知道「至少存活到觀察時間」——這就是 censoring(設限),是存活分析的靈魂。
若你把 censored 病人標記成「未事件」並用 logistic 分析,會嚴重低估事件率;若標記成「事件」更糟。Kaplan & Meier 1958(JASA 53:457,引用次數史上前 10)的 product-limit estimator,第一次正確處理這個問題:在每個事件發生的時間點,重新計算「在那一刻還在風險中的人」的條件存活機率。
One of the most common errors in clinical research: collapsing a follow-up study into "relapsed within 2 years (yes/no)" and running logistic regression — that throws away most of the information. The key to survival analysis is time: two patients who both relapsed but one at month 3 and the other at month 23 carry very different clinical signal. Most importantly, patients who have not yet relapsed when the study ends tell us only "at least survived until the observation time" — this is censoring, the soul of survival analysis.
If you mark censored patients as "no event" and run logistic, you severely underestimate the event rate; marking them as "event" is even worse. Kaplan & Meier 1958 (JASA 53:457, one of the top-10 most-cited papers in history) introduced the product-limit estimator that handles this correctly — at every event time, it recomputes the conditional survival probability among those still at risk.
一、S(t)、h(t)、censoring——三把鑰匙
存活函數 S(t)
S(t) = P(T > t)
「活到時間 t 以上的機率」。S(0)=1,t→∞ 時 S(t)→0 或趨於某下限。
圖形是單調遞減階梯。
S(t) = P(T > t)
"Probability of surviving past time t." S(0)=1; S(t)→0 (or a plateau) as t grows.
The plot is a monotone-decreasing step function.
風險函數 h(t)
h(t) = limΔt→0 P(t ≤ T < t+Δt | T ≥ t)/Δt
「仍存活的條件下,下一瞬間發生事件的速率」——不是機率,是「每單位時間的瞬間事件率」,可以 > 1。
h(t) = limΔt→0 P(t ≤ T < t+Δt | T ≥ t)/Δt
"The instantaneous event rate among those still alive." Not a probability — a rate per unit time; values > 1 are legal.
Censoring 設限
Right(最常見):研究結束、失訪、撤出。
Left:事件已在觀察前發生(HIV 感染日期不明)。
Interval:兩次回診之間發生(每 6 個月掃描偵測腫瘤)。
Right (most common): study ends, lost-to-follow-up, withdrawal.
Left: event occurred before observation (e.g. unknown HIV infection date).
Interval: between two visits (e.g. tumour detected on the 6-monthly scan).
二、Kaplan-Meier 曲線建構器
每組各 100 位「病人」隨機生成 exponential 事件時間(hazard λ)+ uniform censoring。觀察兩組 KM 曲線何時分離;中位存活時間、log-rank χ²、p-value、與經驗 HR 同步更新。試試 λ₁ = 0.04, λ₂ = 0.08——看曲線拉開。
For each arm, 100 "patients" with exponential event times (hazard λ) plus uniform censoring. Watch when the two KM curves separate; median survival, log-rank χ², p-value and empirical HR update live. Try λ₁ = 0.04, λ₂ = 0.08 — see the curves diverge.
階梯線 = KM 估計值 · 短豎線 = censoring · 下表 = at-risk 人數Step lines = KM estimates · Ticks = censoring · Table = number at risk
三、從 KM 到 Cox PH
Kaplan-Meier (1958)
Non-parametric 估計 S(t) 的乘積極限估計:
Ŝ(t) = ∏tᵢ≤t (1 − dᵢ/nᵢ)
tᵢ = 事件時間;dᵢ = 該時點事件數;nᵢ = 該時點仍在風險者。
每出現一個事件,把存活機率乘上「沒事件的條件機率」。Censored 在該時點離開分母 nᵢ,但不直接降低 Ŝ——這就是它正確處理 censoring 的關鍵。CI 用 Greenwood 公式(變異數的乘積展開)。
The non-parametric product-limit estimator:
Ŝ(t) = ∏tᵢ≤t (1 − dᵢ/nᵢ)
tᵢ = event time, dᵢ = events at tᵢ, nᵢ = those still at risk just before tᵢ.
At each event we multiply by the conditional "no-event" probability. Censored subjects leave the denominator nᵢ but do not drop Ŝ — that is how KM handles censoring correctly. CI uses Greenwood's variance formula.
Log-rank (Mantel-Haenszel)
檢驗 H₀: S₁(t) = S₂(t)(兩組存活曲線全等)。對每個事件時點,把觀察事件數 − 期望事件數加總,做 χ² 檢定(自由度 = 組數 − 1)。
注意:log-rank 對 比例風險(PH) 偏離最敏感。若兩條 KM 曲線交叉(HR 隨時間變),log-rank 力量會驟降——這時應改用 RMST 或 Fleming-Harrington 加權 log-rank(強調晚期差異 ρ=0,γ=1)。
Tests H₀: S₁(t) = S₂(t) (the two curves are identical). At each event time it sums observed − expected events and forms a χ² statistic (df = groups − 1).
Caveat: log-rank is most powerful when hazards are proportional. When KM curves cross (HR varies with time), log-rank power crashes — switch to RMST or the Fleming-Harrington weighted log-rank (ρ=0, γ=1 emphasises late differences).
Cox 比例風險模型 (Cox 1972)
對 hazard 建模而非 S(t):
h(t | X) = h₀(t) · exp(β₁X₁ + β₂X₂ + …)
h₀(t) 是未指定的 baseline hazard(non-parametric),exp(βⱼ) 是hazard ratio (HR)——X 增加 1 單位的瞬間風險倍率。
Partial likelihood(Cox 1975 Biometrika):避開估 h₀(t),只用每個事件時點上「實際出事者 vs 風險集合中其他人」的條件機率乘積:
R(tᵢ) = risk set at tᵢ。最大化 log L 取得 β̂。HR = exp(β̂),95% CI = exp(β̂ ± 1.96·SE)。
Models the hazard (not S directly):
h(t | X) = h₀(t) · exp(β₁X₁ + β₂X₂ + …)
h₀(t) is the unspecified baseline hazard (non-parametric); exp(βⱼ) is the hazard ratio (HR) for a one-unit increase in X.
Partial likelihood (Cox 1975 Biometrika) bypasses h₀(t) by multiplying, over each event time, the conditional probability that the actual case (vs others in the risk set) is the one who fails:
R(tᵢ) = risk set at tᵢ. Maximise log L for β̂. HR = exp(β̂); 95% CI = exp(β̂ ± 1.96·SE).
四、PH 假設與補救
Schoenfeld residuals (Schoenfeld 1982 Biometrika; Grambsch-Therneau 1994 Biometrika 81:515, DOI:10.1093/biomet/81.3.515):對每個事件時點計算「實際 X − 期望 X (給定風險集合)」。若 PH 成立,這些殘差對時間應為平坦(無趨勢)。R 的 cox.zph() 用 scaled Schoenfeld residuals 對時間做迴歸並輸出全域 + 每個共變數的 χ² 檢定。
視覺化:(1) 對 log-log scale 畫 KM(log[−log Ŝ(t)] vs log t),兩條線應該平行;(2) 畫 scaled Schoenfeld residuals vs time 加 LOESS 平滑線,應為水平。
Schoenfeld residuals (Schoenfeld 1982 Biometrika; Grambsch-Therneau 1994 Biometrika 81:515, DOI:10.1093/biomet/81.3.515): at each event time, compute "actual X − expected X (given the risk set)". Under PH these residuals should be flat over time. R's cox.zph() regresses scaled Schoenfeld residuals on time and gives global + per-covariate χ² tests.
Visual checks: (1) log-log KM (plot log[−log Ŝ(t)] vs log t) — parallel lines support PH; (2) scaled Schoenfeld residuals vs time with a LOESS smooth — should be flat.
| 補救 | 怎麼用 | 何時最適合 | ||
|---|---|---|---|---|
| 時間相依共變數 | tt(x) 在 survival 套件中,或對 X 與 log t 的交互作用建模 | HR 隨時間單調變化(漸增 / 漸減) | tt(x) in R's survival, or model X × log(t) interaction | HR drifts monotonically with time |
| 分層 Cox | strata(z):對每個 strata 用不同 h₀(t),但共享 β | 某干擾因子 PH 違反但不感興趣(如 stage I/II/III) | strata(z): separate h₀(t) per stratum, common β | Some nuisance factor (stage I/II/III) violates PH but isn't of interest |
| 參數 AFT 模型 | Weibull / log-normal / log-logistic 假設 log T 是線性函數 | 想做外推(追蹤期外的存活預測);HR 不恆定 | Weibull / log-normal / log-logistic — log T as linear function | Need extrapolation beyond follow-up; non-PH |
| RMST (Royston-Parmar 2013) | RMST(τ) = ∫₀^τ S(u) du;兩組差 / 比例可檢定,單位是「天」 | KM 交叉、PH 嚴重違反;臨床要可解讀的「平均存活時間差」 | RMST(τ) = ∫₀^τ S(u) du; difference/ratio testable, units = "days" | Crossing KM, severely non-PH; clinically interpretable mean-time gain |
五、HR vs RMST
固定 control hazard λ = 0.1 /year,調整 HR。Treatment 的 hazard = λ · HR。觀察:(1) HR = 0.7 看似「30% 降低」,但 5 年絕對存活差可能只 5%;(2) RMST(τ=5) 的差值 (天) 才是臨床能溝通的語言;(3) 兩條曲線「越拉越開」是 PH 模型的特徵——真實世界很少完美。
Hold control hazard at λ = 0.1 /year; slide HR. Treatment hazard = λ · HR. Observe: (1) HR = 0.7 sounds like "30% reduction" but 5-year absolute survival difference may be only 5%; (2) RMST(τ=5) difference (days) is the language clinicians use; (3) the "ever-diverging" pattern is a feature of the PH model — real-world rarely matches it perfectly.
六、競爭風險
研究「乳癌死亡」時,若一位 80 歲病人死於心衰,傳統 KM 把她當 censored——這假設「死於心衰的人若還活著,未來死於乳癌的風險與其他人相同」。對老年族群明顯錯誤。
兩種正確做法:
- Cause-specific hazard:對每種事件型別跑分開的 Cox,把其他事件當 censored。回答「條件上,治療對該死因 hazard 的效應」——適用於探討生物機制。
- Fine-Gray subdistribution hazard (Fine & Gray 1999 JASA 94:496):對 cumulative incidence function (CIF) 建模,不把競爭事件當 censored,直接回答「治療對某死因絕對發生機率的影響」——適用於溝通臨床絕對風險。
Studying "death from breast cancer": if an 80-year-old dies of heart failure, classical KM marks her as censored, assuming "if she'd survived, her future breast-cancer risk would match the others." For elderly cohorts this is obviously wrong.
Two correct approaches:
- Cause-specific hazard: a separate Cox for each event type, treating the other as censored. Answers "conditionally, treatment's effect on the cause-specific hazard" — good for biological mechanism questions.
- Fine-Gray subdistribution hazard (Fine & Gray 1999 JASA 94:496): models the cumulative incidence function (CIF) without censoring the competing event; directly answers "treatment's effect on the absolute incidence of the cause" — good for communicating absolute clinical risk.
七、選哪個存活方法?
🌳 存活分析方法決策樹
cox.zph() p > 0.05 通常 OK。cox.zph() p > 0.05 is typically fine.八、Cox vs AFT vs RMST
| 特性 | Cox PH | 參數 AFT | RMST | |||
|---|---|---|---|---|---|---|
| 模型化什麼 | hazard h(t|X) | log T 線性,含 h₀(t) 形狀 | ∫₀^τ S(u) du | hazard h(t|X) | log T linear, with h₀(t) shape | ∫₀^τ S(u) du |
| 效應量 | HR = exp(β) | 時間倍率 (time ratio, TR) = exp(β) | RMST 差 (天/月) 或比例 | time ratio (TR) = exp(β) | RMST difference (days/months) or ratio | |
| 假設 | PH(HR 隨時間恆定) | 指定分布(Weibull/log-normal…) | 無 PH 假設;需選 τ | PH (HR constant in time) | Specified distribution (Weibull/log-normal…) | No PH; choose τ |
| 可外推? | 否(h₀ unspecified) | 可以(已參數化) | 只到 τ | No (h₀ unspecified) | Yes (fully parametric) | Only up to τ |
| PH 違反時 | ⚠️ β̂ 是「平均」HR,解讀失真 | 改用 log-logistic / generalised gamma | 不受影響 | ⚠️ β̂ becomes an "average" HR — distorted | Switch to log-logistic / generalised gamma | Unaffected |
| 臨床溝通 | 「風險減 30%」 | 「存活時間延長 1.5 倍」 | 「5 年內平均多活 3.2 個月」← 最直觀 | "30% lower risk" | "1.5× longer survival time" | "3.2 more months on average over 5 yrs" ← clearest |
| R 套件 | survival::coxph() | survival::survreg(), flexsurv | survRM2, survival::survfit() + RMST | survival::coxph() | survival::survreg(), flexsurv | survRM2, survival::survfit() + RMST |
九、實作範例
# --- KM + log-rank + Cox + Schoenfeld + RMST + Fine-Gray --- library(survival) library(survminer) # nice KM plots library(survRM2) # RMST library(cmprsk) # Fine-Gray # --- 1. Kaplan-Meier --- fit <- survfit(Surv(time, status) ~ arm, data = trial) summary(fit, times = c(12, 24, 36)) # S(t) at landmarks ggsurvplot(fit, risk.table = TRUE, conf.int = TRUE, pval = TRUE, surv.median.line = "hv") # --- 2. Log-rank test --- survdiff(Surv(time, status) ~ arm, data = trial) # --- 3. Cox PH --- cox <- coxph(Surv(time, status) ~ arm + age + sex + stage, data = trial) summary(cox) # HR, 95% CI, Wald / LR / score tests # --- 4. PH check (Schoenfeld) --- zph <- cox.zph(cox) print(zph) # global + per-covariate chi-sq, p plot(zph) # Schoenfeld vs time + LOESS # --- 5. If PH fails: stratified Cox + time-varying --- cox2 <- coxph(Surv(time, status) ~ arm + age + strata(stage), data = trial) cox3 <- coxph(Surv(time, status) ~ arm + tt(arm) + age, tt = function(x,t,...) x*log(t+1), data = trial) # --- 6. RMST (preferred when PH violated) --- rm <- rmst2(time = trial$time, status = trial$status, arm = trial$arm, tau = 36) print(rm) # RMST diff + ratio, 95% CI # --- 7. Competing risks: Fine-Gray --- # status: 0=censored, 1=cancer death, 2=other death fg <- crr(ftime = trial$time, fstatus = trial$status, cov1 = trial[, c("arm","age")], failcode = 1, cencode = 0) summary(fg)
import pandas as pd, numpy as np from lifelines import KaplanMeierFitter, CoxPHFitter, WeibullAFTFitter from lifelines.statistics import logrank_test, restricted_mean_survival_time from lifelines import NelsonAalenFitter import matplotlib.pyplot as plt T = trial['time']; E = trial['status']; arm = trial['arm'] # --- 1. Kaplan-Meier --- kmf = KaplanMeierFitter() for g in arm.unique(): kmf.fit(T[arm==g], E[arm==g], label=f'arm={g}') kmf.plot_survival_function(at_risk_counts=True) plt.show() # --- 2. Log-rank --- lr = logrank_test(T[arm==0], T[arm==1], E[arm==0], E[arm==1]) print(lr.test_statistic, lr.p_value) # --- 3. Cox PH --- cph = CoxPHFitter() cph.fit(trial, duration_col='time', event_col='status', formula='arm + age + sex + stage') cph.print_summary() # HR + 95% CI # --- 4. PH assumption --- cph.check_assumptions(trial, p_value_threshold=0.05, show_plots=True) # --- 5. RMST --- rmst_a = restricted_mean_survival_time(kmf, t=36) # arm = 0 # or: from lifelines.utils import restricted_mean_survival_time # --- 6. Weibull AFT (for extrapolation) --- aft = WeibullAFTFitter() aft.fit(trial, duration_col='time', event_col='status', formula='arm + age') aft.print_summary() # --- 7. Competing risks: Fine-Gray (use scikit-survival) --- from sksurv.linear_model import CoxPHSurvivalAnalysis # For Fine-Gray, see: pycr or use cmprsk via rpy2
十、五個常見錯誤
❌ 不檢查 PH 假設
最常見的紅旗。跑了 Cox 報 HR,卻沒提 cox.zph。若 PH 違反,HR 變成「對 baseline hazard 加權的平均」——時間早期效應與晚期效應若相反,HR 會誤導。解:永遠跑 cox.zph、看 Schoenfeld 殘差、必要時改用 RMST 或時間相依共變數。
The single most common red flag. Run Cox, report HR, never mention cox.zph. If PH fails, HR becomes a hazard-weighted average — early and late effects of opposite sign can mask each other. Fix: always run cox.zph, inspect Schoenfeld residuals, switch to RMST or time-varying covariates as needed.
❌ Informative censoring 被忽略
若病情惡化的病人比較容易撤出研究(撤出 → 被 censor),KM 與 Cox 會系統性高估存活。解:(1) Methods 報撤出原因;(2) 做敏感性分析:把所有 censored 當「事件」與「無事件」兩極情境,看結論是否穩定(Tipping point analysis);(3) 用 IPCW(inverse probability of censoring weighting)。
If sicker patients drop out (withdraw → censor), KM and Cox systematically overestimate survival. Fix: (1) report reasons for censoring; (2) sensitivity analysis: treat all censored cases as events vs no events to see how robust your conclusion is (tipping-point analysis); (3) IPCW (inverse probability of censoring weighting).
❌ Immortal time bias
Lévesque 2010 BMJ 340:b5087。觀察性研究:把「曾經接受治療」病人從 diagnosis 起算 follow-up,但他們必須活到治療日才能被分到「治療組」——這段「不死的時間」被歸功於治療。解:用時間相依的治療指標(time-varying covariate)或 landmark analysis(從某個時間點開始觀察,之前的事件全納入)。
Lévesque 2010 BMJ 340:b5087. In observational studies, counting follow-up of "ever-treated" patients from diagnosis even though they must survive until the treatment day to be classified as "treated" — that "immortal time" gets credited to the treatment. Fix: use a time-varying treatment indicator or landmark analysis (start the clock at a fixed landmark, count pre-landmark events).
❌ 競爭風險當 censoring
研究「乳癌死」時把「心血管死」當 censored——KM 會高估乳癌死的累積發生率(CIF)。在老年族群、心衰、移植研究中尤其嚴重。解:報 CIF(用 cmprsk::cuminc 而非 1−KM)+ Fine-Gray 或 cause-specific Cox 並列。
Studying "cancer death": treating "cardiovascular death" as censored — KM overestimates the cancer-death cumulative incidence (CIF). Especially severe in elderly cohorts, heart-failure, and transplant studies. Fix: report CIF (cmprsk::cuminc, not 1−KM) + Fine-Gray or cause-specific Cox in parallel.
❌ 混淆 median follow-up 與 median survival
「median follow-up = 18 months」≠「median survival = 18 months」。前者描述「資料的成熟度」,用 reverse KM(把 censored 視為事件、事件視為 censored);後者是 S(t)=0.5 對應的 t。若超過一半病人仍存活,median survival 應報 NR (not reached)。
"median follow-up = 18 months" ≠ "median survival = 18 months". The first is data maturity, computed by reverse KM (swap event/censor roles); the second is the t where S(t)=0.5. If > 50% are still alive, report median survival as NR (not reached).
❌ HR 誤譯為「兩倍速度」
HR = 2 不表示「死亡快兩倍」,而是「瞬間風險加倍」。Hernán 2010 Epidemiology 提醒:對病人溝通改用「絕對風險差」或「RMST 差(多活 X 個月)」——病人能聽懂,HR 不能。
HR = 2 does not mean "dying twice as fast" — it means "twice the instantaneous risk". Hernán 2010 Epidemiology: for patient communication, use absolute risk difference or RMST difference ("X more months on average"). Patients understand those; they don't understand HR.
📝 自我檢測
1. 你跑 Cox 模型,cox.zph 全域 p = 0.001,且該共變數的 Schoenfeld residuals vs time 有明顯下降趨勢。最佳做法?
1. You run Cox; cox.zph global p = 0.001, with a clear downward trend in Schoenfeld residuals vs time. Best action?
2. 一項老年心衰隊列研究探討某藥物對「心臟猝死」的影響。其他死因(癌症、肺炎)占了 40% 死亡。下列何者最合適?
2. An elderly heart-failure cohort study examines a drug's effect on "sudden cardiac death". Non-cardiac deaths (cancer, pneumonia) account for 40% of deaths. Best approach?
3. 觀察性研究發現「接受 X 治療的病人存活較好」,但 X 在中位 6 個月後才開始。下列何者最可能是錯誤來源?
3. An observational study finds "patients receiving treatment X survive longer", but X usually starts at median month 6 post-diagnosis. The most likely bias is:
4. 你的試驗 KM 曲線在前 12 個月幾乎重疊,之後才分開。log-rank p = 0.15。下列何者最合適?
4. In your trial, KM curves overlap for the first 12 months and separate later. Log-rank p = 0.15. Best approach?