為什麼存活分析需要自成一格的工具
「存活時間」可以泛指任何「事件發生前所經歷的時間」——患者死亡、腫瘤復發、機器故障、細胞分化。這類資料有兩個一般迴歸無法處理的特性:
- 審查 (censoring):在研究結束時,有些個體仍未發生事件——我們只知道「至少活到了 t 時間」,而非確切的存活時間。右審查最常見;左審查(事件發生前個體已存在)與區間審查也存在。
- 截斷 (truncation):個體只在某時間範圍內被觀察到。例如延遲進入 (left truncation) 在 TCGA 隊列中常見。
把存活時間當成普通連續變數做 t 檢定或線性迴歸,會把審查資料當「事件已發生」,整體存活時間嚴重低估、效應方向甚至顛倒。存活分析的核心思想是:對每個時間點 t,問「在 t 時刻仍存活的個體中,下一刻死亡的瞬時風險是多少?」——這就是 hazard function h(t)。所有方法(KM、log-rank、Cox PH、Weibull AFT)都建立在這個 hazard 框架上。
"Survival time" generalises to any "time until an event": patient death, tumour relapse, machine failure, cell differentiation. Such data have two features ordinary regression cannot handle:
- Censoring: at study end, some individuals have not yet had the event — we only know "alive past t", not the actual time. Right-censoring is most common; left and interval censoring also occur.
- Truncation: individuals are only observed within a window. Left truncation (delayed entry) is common in TCGA cohorts.
Treating survival time as ordinary continuous data and running a t-test or linear regression treats censored observations as "events occurred", severely underestimating survival and sometimes flipping the effect direction. The core idea of survival analysis: at each time t, ask "among individuals still alive at t, what is the instantaneous risk of dying in the next moment?" — this is the hazard function h(t). Every method (KM, log-rank, Cox PH, Weibull AFT) builds on this hazard framework.
一、KM、log-rank 與 Cox PH
Kaplan-Meier (KM)
- 非參數的 S(t) 估計:
S(t) = Π_{t_i ≤ t} (1 − d_i / n_i)。 - 在每個事件時間點根據「事件數 d / 風險集 n」更新存活機率,產生階梯式曲線。
- 逐點 CI 用 Greenwood 公式;中位存活時間 = S(t) 首次跨越 0.5 的 t。
- 適合描述、繪圖與報告;不調整共變數。
- Non-parametric S(t) estimator:
S(t) = Π_{t_i ≤ t} (1 − d_i / n_i). - Updates at each event time using d / n; produces a step function.
- Pointwise CI via Greenwood's formula; median survival is where S(t) first crosses 0.5.
- Good for description and figures; cannot adjust for covariates.
log-rank 檢定
- 比較兩條或多條 KM 曲線的整體差異。
- 檢定觀察到的事件數 vs 各時點的期望事件數(在 H₀ 下兩組 hazard 相同)。
- 在比例風險假設下檢力最高;若曲線交叉,檢力會大降。
- 變體:Gehan-Breslow(早期加權)、Tarone-Ware、Fleming-Harrington(給特定時段加權)。
- Compares two or more KM curves overall.
- Tests observed vs expected events at each time point under H₀: equal hazards.
- Most powerful under proportional hazards; loses power if curves cross.
- Variants: Gehan-Breslow (early-weighted), Tarone-Ware, Fleming-Harrington (time-weighted).
Cox PH 模型
- 半參數:
h(t | x) = h₀(t) · exp(Xβ),h₀(t) 不指定形式。 - 係數 β 是 log hazard ratio;exp(β) 是 HR。
- partial likelihood 估計——不需估 h₀(t)。
- 連續共變數直接放——比 dichotomise 強大得多。
- 支援 strata(不同 baseline hazard)、time-varying 共變數。
- Semi-parametric:
h(t | x) = h₀(t) · exp(Xβ); baseline h₀(t) unspecified. - Coefficients β are log hazard ratios; exp(β) is HR.
- Estimated via partial likelihood — h₀(t) is never explicitly modelled.
- Continuous covariates enter directly — far more powerful than dichotomising.
- Supports strata (separate baselines) and time-varying covariates.
PH 假設與補救
- Schoenfeld 殘差:對時間有殘留趨勢 → PH 違反。
cox.zph()給每個共變數做檢定。 - log-log plot:log(−log S(t)) 在不同組應為平行直線。
- 違反 → time-varying coefficient (
tt())、stratification (strata())、或改用 RMST、AFT 模型。 - 競爭風險:當另一個事件會「先發生」(如非癌死亡讓癌復發無法觀察)→ Fine-Gray subdistribution hazards。
- Schoenfeld residuals: residual trend vs time → PH violated.
cox.zph()tests each covariate. - log-log plot: log(−log S(t)) should be parallel across groups.
- If violated → time-varying coefficient (
tt()), stratification (strata()), or switch to RMST / AFT. - Competing risks: when another event preempts the one of interest (non-cancer death prevents observing relapse) → Fine-Gray subdistribution hazards.
二、生信應用情境
TCGA pan-cancer survival:把基因表現量分位(或直接放連續值)與 OS / PFS 一起放入 Cox PH,篩出 HR 顯著的「預後基因」。連續放入 + spline 比中位切分檢力大很多,也避免 cutpoint hacking。
biomarker discovery:常見錯誤是「試遍所有 cutpoint,挑 p 最小的那個」——這會把 Type I 錯誤率拉到 30% 以上。正確做法:maxstat + 校正、Cox PH on 連續變數、或 cross-validation 選 cutpoint。
scRNA-seq + 預後:先做 cell-type 解構(CIBERSORT、scaden)得到每位病人的免疫成分→ 把成分當共變數放進 Cox。或先做 trajectory 取得 pseudotime,再分析 pseudotime 階段與生存的關聯。
臨床試驗:主要終點通常是 OS / PFS;分析計畫需事先指定(pre-specified)log-rank vs stratified log-rank,並設好 interim analysis 的 α 花費函數。
TCGA pan-cancer survival: feed gene expression (preferably continuous + spline) alongside OS / PFS into Cox PH to find prognostic genes by HR. Continuous + spline is far more powerful than median dichotomisation and avoids cutpoint hacking.
Biomarker discovery: a common mistake is "search all cutpoints, pick the smallest p" — Type I error blows up past 30%. Correct approaches: maxstat with adjustment, Cox PH on continuous values, or cross-validation for cutpoint selection.
scRNA-seq + outcomes: deconvolve cell types per patient (CIBERSORT, scaden) and feed fractions as covariates into Cox. Or extract pseudotime stages from trajectory analysis and test their association with survival.
Clinical trials: primary endpoint is typically OS / PFS; the statistical analysis plan must pre-specify log-rank vs stratified log-rank and set α-spending functions for interim analyses.
KM 沙盒
合成兩臂試驗:每組 n 人,存活時間為指數分布,治療組的 hazard 為對照組乘以 HR。拖動滑桿觀察 KM 曲線、log-rank p、Cox HR 估計與其 95% CI 如何隨樣本量與審查率變化。
Synthetic two-arm trial: n per arm, exponential survival, treatment hazard = control hazard × HR. Slide sample size and censoring rate to watch the KM curves, log-rank p, Cox HR estimate, and 95% CI respond.
X: time | Y: S(t) | 藍:對照;紅:治療
存活分析決策樹
🌳 從研究問題到方法選擇
| 情境 | 推薦方法 | 診斷 | |||
|---|---|---|---|---|---|
| 兩臂臨床試驗 OS | KM + log-rank + Cox HR | cox.zph | Two-arm trial OS | KM + log-rank + Cox HR | cox.zph |
| TCGA 基因預後 | Cox PH on continuous + spline | spline 顯著性 + cox.zph | TCGA gene prognosis | Cox PH on continuous + spline | spline significance + cox.zph |
| 免疫治療曲線交叉 | RMST 或 weighted log-rank | cox.zph 顯著 | Immunotherapy crossing curves | RMST or weighted log-rank | significant cox.zph |
| 老年隊列(非癌死亡多) | Fine-Gray | competing event % | Elderly cohort (many non-cancer deaths) | Fine-Gray | competing event % |
| 基因組高維預測 | penalised Cox / RSF | C-index, time-dependent AUC | High-dim genomic prediction | penalised Cox / RSF | C-index, time-dependent AUC |
實作:完整流程
# --- R --- survival / survminer 標準流程 library(survival); library(survminer) # df 需有 time, event(0/1), group, age, stage, gene_expr # 1) Surv object 與 KM 估計 S <- Surv(df$time, df$event) km <- survfit(S ~ group, data = df) ggsurvplot(km, data=df, pval=TRUE, risk.table=TRUE, conf.int=TRUE, xlab="Months", ylab="Overall survival") # 2) log-rank 檢定 survdiff(S ~ group, data = df) # 3) Cox PH(連續共變數 + strata) fit <- coxph(S ~ gene_expr + age + strata(stage), data = df) summary(fit) # HR, CI, z, p cox.zph(fit) # PH 診斷 # 4) 連續變數效應視覺化(spline) library(rms); ddist <- datadist(df); options(datadist="ddist") fit2 <- cph(S ~ rcs(gene_expr, 4) + age, data = df, x=TRUE, y=TRUE) plot(Predict(fit2, gene_expr)) # 5) Time-varying coefficient / 時變共變數 df_tv <- tmerge(df, df, id=patient, event=event(time, event)) coxph(Surv(tstart, tstop, event) ~ treatment + tt(treatment), data=df_tv, tt = function(x, t, ...) x * log(t+1)) # 6) Fine-Gray 競爭風險 library(cmprsk); crr(ftime=df$time, fstatus=df$status, cov1=df[,c("age","gene")])
# --- Python --- lifelines / scikit-survival from lifelines import KaplanMeierFitter, CoxPHFitter, CoxTimeVaryingFitter from lifelines.statistics import logrank_test, multivariate_logrank_test # 1) KM curve kmf = KaplanMeierFitter() for g, sub in df.groupby("group"): kmf.fit(sub.time, sub.event, label=g).plot_survival_function() print(kmf.median_survival_time_) # 2) log-rank res = logrank_test(t_A, t_B, event_observed_A=e_A, event_observed_B=e_B) res.p_value, res.test_statistic # 3) Cox PH on continuous + strata cph = CoxPHFitter() cph.fit(df, duration_col="time", event_col="event", formula="gene_expr + age", strata=["stage"]) cph.print_summary(); cph.plot() # 4) PH assumption check cph.check_assumptions(df, p_value_threshold=0.05, show_plots=True) # 5) Time-varying covariate ctv = CoxTimeVaryingFitter() ctv.fit(df_long, id_col="id", event_col="event", start_col="start", stop_col="stop") # 6) High-dim: scikit-survival (penalised Cox) from sksurv.linear_model import CoxnetSurvivalAnalysis from sksurv.util import Surv y = Surv.from_dataframe("event", "time", df) CoxnetSurvivalAnalysis(l1_ratio=0.9).fit(X, y)
cox.zph — PH 違反時 HR 失去解釋。(4) 把 KM 中位生存當成「平均」 — 重審查時兩者差異很大。(5) 忽略競爭風險 — KM 在競爭事件多時高估 cause-specific 機率。
Common mistakes: (1) Treating survival time as ordinary continuous data — censoring is mis-coded. (2) Median-splitting a continuous biomarker before KM — power loss + cutpoint hacking. (3) Reporting Cox HRs without cox.zph — HR loses meaning under PH violation. (4) Reporting KM median as a "mean" — they diverge under heavy censoring. (5) Ignoring competing risks — KM overestimates cause-specific probabilities when competing events are frequent.
📝 自我檢測
1. 「非資訊性審查 (non-informative censoring)」假設的內容與為何重要?
1. What does the "non-informative censoring" assumption state and why does it matter?
2. Schoenfeld 殘差如何用來診斷 PH 違反?
2. How do Schoenfeld residuals diagnose violation of proportional hazards?
3. 為什麼在 Cox PH 中把連續生物標誌物在中位數切成「高 / 低」是統計上不佳的做法?
3. Why is median-dichotomising a continuous biomarker in Cox PH statistically poor practice?