STEP 16 / 17

排列、Bootstrap、GSEA

當參數假設失效或統計量沒有閉合分布時,重抽樣(resampling)是最可靠的工具。本章把排列檢定、bootstrap、基因集富集(GSEA)整合成一個 resampling 工具箱。

When parametric assumptions break or no closed-form null exists, resampling is the most reliable tool. This chapter unifies permutation tests, bootstrap, and gene-set enrichment (GSEA) into one resampling toolkit.

為何在生信中重抽樣是必要技能

許多生物資料分析的統計量並沒有漂亮的閉合分布:基因集富集分數 (ES)、網絡中介中心度、scRNA-seq 軌跡距離、複雜的 ratio 與 trimmed mean——這些都很難套教科書的 t/F 分布。重抽樣用「資料本身」生成虛無分布或不確定度估計,繞過參數假設:

  • 排列檢定:在 H₀ 下「標籤可交換」,重複洗牌組標籤,計算每次的統計量,比較觀察值與這些排列值——黃金標準的 FWER / FDR 控制。
  • Bootstrap:從原資料抽樣(with replacement),重複計算統計量得其抽樣分布——拿來算 SE、CI、bias。
  • GSEA:以「樣本標籤排列」對排序後的基因清單做 KS-like 富集檢定——是排列檢定在生物路徑層級的應用。

Many biological statistics lack a closed-form distribution: GSEA enrichment scores, network betweenness, scRNA-seq trajectory distances, complex ratios and trimmed means — none fit textbook t / F. Resampling uses the data itself to generate a null or an uncertainty estimate, bypassing parametric assumptions:

  • Permutation tests: under H₀ labels are exchangeable; shuffle labels repeatedly, recompute the statistic, compare the observed value to the permutation distribution — gold-standard FWER / FDR control.
  • Bootstrap: resample with replacement from the data; recompute the statistic to obtain its sampling distribution — gives SE, CI, bias.
  • GSEA: applies sample-label permutation to a KS-like enrichment score on a ranked gene list — permutation testing at the pathway level.
💡
選擇原則:檢定 H₀ → 排列;估計 SE / CI → bootstrap。兩者用法不可互換。Bootstrap 的 CI 並不保證 Type I 控制;排列的不確定度不能直接拿來當 SE。 How to choose: testing H₀ → permutation; estimating SE / CI → bootstrap. They are NOT interchangeable. A bootstrap CI does not guarantee Type I control; permutation does not directly give a SE.

一、排列檢定與 Bootstrap

🔀

排列檢定 (permutation)

  • H₀:兩組來自同一分布,組別標籤可交換。
  • 洗牌標籤 B 次,每次重算統計量 → 得到虛無分布。
  • p = (未來排列中 ≥ 觀察值的比例)。
  • 對小樣本、非常態資料、自訂統計量都有效。
  • 限制:n 太小 → 排列數有限 → p 無法低於 1/(B+1)。
  • H₀: both groups come from the same distribution; labels are exchangeable.
  • Shuffle labels B times, recompute the statistic → null distribution.
  • p = (fraction of permutations ≥ observed value).
  • Works for small samples, non-normal data, custom statistics.
  • Caveat: very small n → few unique permutations → p cannot go below 1/(B+1).
🥾

Bootstrap

  • 從原資料抽 n 個(with replacement),重複 B 次。
  • 得到統計量的經驗抽樣分布,可算 SE、percentile CI、BCa CI。
  • 適用:複雜統計量(中介效應、變異成分、網絡度量)、無 SE 公式的場合。
  • 需大量重抽樣(≥1000);可能在尾端表現不佳。
  • 對相依資料用 block bootstrapcluster bootstrap
  • Sample n observations with replacement from the data; repeat B times.
  • Yields the empirical sampling distribution: SE, percentile CI, BCa CI.
  • Use for: complex statistics (mediation, variance components, network metrics) where no SE formula exists.
  • Needs many resamples (≥1000); tail behaviour can be poor.
  • For dependent data use block or cluster bootstrap.

二、GSEA 與基因集檢定

把所有基因按某個統計量(log fold change、moderated t、signal-to-noise)排序,GSEA 的問題是:某條基因集 S(例如 Hallmark INTERFERON_ALPHA)的成員,是否「集中在排序的上端或下端」?

Running Enrichment Score (ES):從上而下掃描排序清單,遇到 S 的成員就加分(依排序值加權),遇到非成員就減分。ES = 全程偏離 0 的最大絕對值。顯著性由樣本標籤排列得到——重新洗組別→重排基因→重算 ES→得 ES 虛無分布。

兩種對立的虛無:(a) self-contained(如 roast)—— H₀:S 中無基因有差異表現。(b) competitive(如 GSEA、camera)—— H₀:S 中的基因「比 S 外的基因更顯著」。生物學家通常想要 competitive,因為它回答「這條 pathway 比背景更被擾動嗎?」。

ORA(Over-Representation Analysis,hypergeometric / Fisher)用 DE 基因清單對基因集的交集做超幾何檢定——速度快,但忽略了排序、丟失了亞顯著的累積訊號,且不處理基因間相關性。camera / roast 校正了相關性;fgsea 用適應性排列大幅加速。

Rank all genes by a statistic (log fold change, moderated t, signal-to-noise). GSEA asks: do the members of a gene set S (e.g. Hallmark INTERFERON_ALPHA) cluster at the top or bottom of this ranking?

Running Enrichment Score (ES): walk the ranked list top to bottom; add to a running sum at each S member (weighted by rank value), subtract at non-members. ES is the maximum absolute deviation from 0. Significance comes from sample-label permutation — shuffle group labels → re-rank genes → recompute ES → build an ES null distribution.

Two competing nulls: (a) self-contained (e.g. roast) — H₀: no gene in S is differentially expressed. (b) competitive (e.g. GSEA, camera) — H₀: genes in S are no more differential than genes outside S. Biologists usually want the competitive null because it answers "is this pathway more perturbed than background?".

ORA (Over-Representation Analysis, hypergeometric / Fisher's exact) computes the overlap of a DE gene list with a set — fast, but ignores ranking, loses sub-significant accumulating signal, and ignores inter-gene correlation. camera / roast correct for correlation; fgsea uses adaptive permutation for huge speedups.

⚠️
樣本 vs 基因排列:GSEA 預設用樣本標籤排列——保留基因間相關性,是正確的虛無。基因排列(隨機洗基因 ID)會破壞 pathway 內基因間共變異,膨脹 ES 顯著性、產生大量假陽性。當樣本太少(n < 8)以致排列數不足時,採用 fgsea 的 gamma 近似或 roast 的旋轉檢定。
Sample vs gene permutation: GSEA's default is sample-label permutation — it preserves inter-gene correlation and is the correct null. Gene-set permutation (shuffling gene IDs) destroys within-pathway correlation, inflates ES significance, and floods false positives. When n is too small (n < 8) for enough sample permutations, use fgsea's gamma approximation or roast's rotation test.

三、計算捷徑

對 m=20,000 基因 × 1,000 排列 = 2×10⁷ 次測試,naive 實作會在 2026 年也跑很久。常用加速:

  • fgsea::fgsea:用 adaptive multilevel 排列,把 p < 10⁻⁶ 的 pathway 也能精準估計。
  • limma::camera:用 inter-gene correlation 解析調整 t 統計量,無須排列。
  • roast / mroast:用旋轉檢定(rotation test)取代排列——適合 n ≥ 3 的小樣本。
  • SAM (Significance Analysis of Microarrays):對排列得到的虛無分布做 FDR——早期但仍實用。
  • permutation-based GWAS p-value:對表型排列以校正 LD 結構。

For m = 20,000 genes × 1,000 permutations = 2×10⁷ tests, naive code would still be slow in 2026. Standard accelerations:

  • fgsea::fgsea: adaptive multilevel permutation — accurately estimates p < 10⁻⁶ for top pathways.
  • limma::camera: analytic correction using inter-gene correlation — no permutation needed.
  • roast / mroast: rotation test replaces permutation — viable for n ≥ 3.
  • SAM: early FDR-via-permutation method, still useful.
  • Permutation GWAS p-values: phenotype permutation to retain LD structure.

GSEA:滾動 ES 與排列虛無

下方是合成的 500 基因排序清單,目標基因集(綠色標記)共 30 個成員,集中於上端。running ES 沿排序累積:紅曲線上揚=集合的「命中」加分,下落=「未命中」減分。調整集中強度n_perm,右下小圖即時更新排列虛無,黑線為觀察 ES,紅虛線為 p < 0.05 閾值。

Below is a synthetic ranked list of 500 genes; the target gene set (green marks) has 30 members concentrated near the top. The running ES walks the list: red rises at "hits", falls at "misses". Adjust concentration and n_perm; the inset updates the permutation null in real time — black line is observed ES, red dashed line is the p = 0.05 quantile.

ES — p — B —

主圖:running ES(紅)+ 命中位置(綠線) · 副圖:排列下 ES 虛無分布

實作

# --- R --- 排列 / Bootstrap / GSEA

# 1) 自製排列檢定(兩組均值差)
obs <- mean(x[g=="A"]) - mean(x[g=="B"])
perm <- replicate(10000, {
  gp <- sample(g)
  mean(x[gp=="A"]) - mean(x[gp=="B"])
})
mean(abs(perm) >= abs(obs))             # two-sided p

# 2) Bootstrap CI of a complex statistic
library(boot)
b <- boot(data, statistic = function(d, i) cor(d[i,1], d[i,2]), R = 2000)
boot.ci(b, type = c("perc", "bca"))

# 3) fgsea — fast GSEA on ranked stats
library(fgsea); library(msigdbr)
pathways <- split(msigdbr_df$gene_symbol, msigdbr_df$gs_name)
ranked   <- setNames(res$stat, res$gene)               # named numeric
fg <- fgsea(pathways, stats = ranked, minSize=15, maxSize=500)
head(fg[order(fg$padj), ], 10)

# 4) limma::camera / roast — correlation-aware
library(limma)
idx <- ids2indices(pathways, rownames(v$E))
camera(v, idx, design, contrast = cont)
mroast(v, idx, design, contrast = cont, nrot = 9999)

# 5) ORA(hypergeometric / Fisher)
library(clusterProfiler)
enrichGO(de_genes, OrgDb="org.Hs.eg.db", ont="BP", pAdjustMethod="BH")
# --- Python ---
import numpy as np
from scipy import stats

# 1) scipy permutation_test — vectorised, exact when feasible
def mean_diff(a, b): return a.mean() - b.mean()
res = stats.permutation_test((x_A, x_B), mean_diff, n_resamples=10000,
                             alternative="two-sided", permutation_type="independent")
res.statistic, res.pvalue

# 2) Bootstrap CI
res = stats.bootstrap((x,), np.median, n_resamples=2000, method="BCa")
res.confidence_interval

# 3) GSEA via gseapy
import gseapy as gp
pre = gp.prerank(rnk=ranked_df, gene_sets="MSigDB_Hallmark_2020",
                 min_size=15, max_size=500, permutation_num=1000)
pre.res2d.sort_values("NOM p-val").head(10)

# 4) Hypergeometric ORA
gp.enrichr(gene_list=de_genes, gene_sets="GO_Biological_Process_2023")

# 5) Block permutation (for time-series / clustered data)
def block_perm(x, block_size, B):
    blocks = x.reshape(-1, block_size)
    out = []
    for _ in range(B):
        idx = np.random.permutation(blocks.shape[0])
        out.append(blocks[idx].ravel())
    return np.array(out)
🚫
常見錯誤:(1) 用基因排列跑 GSEA — 破壞相關性結構、假陽性爆增。(2) n=3 vs 3 跑只有 20 種唯一排列,卻報 p = 0.001 — 這個 p 不可能存在。(3) 把 bootstrap percentile CI 直接拿來宣稱「95% Type I 控制」— bootstrap 是估計工具,不是檢定。(4) ORA 用「padj < 0.05」的 DE 基因清單跑 — 對小研究幾乎沒結果;GSEA 用全部排序基因更穩健。 Common mistakes: (1) Using gene permutation in GSEA — destroys correlation, inflates false positives. (2) n=3 vs 3 has only 20 unique permutations, yet reporting p = 0.001 — that p is impossible. (3) Using a bootstrap percentile CI as a "95% Type I" claim — bootstrap estimates uncertainty, not error rate. (4) ORA over the "padj < 0.05" DE list — yields nothing in underpowered studies; GSEA over the full ranking is more robust.

📝 自我檢測

1. 為什麼樣本標籤排列得到的 GSEA p-value 比基因排列的更保守、更可信?

1. Why does sample-label permutation yield a more conservative and reliable GSEA p-value than gene-set permutation?

A. 樣本排列計算速度更慢A. Sample permutation runs more slowly
B. 樣本排列只測試 H₀ 的一部分B. Sample permutation only tests part of H₀
C. 樣本排列保留基因間的相關結構;基因排列破壞共變異,讓 ES 的虛無太窄、p 太小C. Sample permutation preserves inter-gene correlation; gene permutation destroys it, making the ES null too narrow and p-values too small
D. 樣本排列不需要 FDR 校正D. Sample permutation requires no FDR correction

2. 何種情境下,bootstrap 比排列檢定更合適?

2. When is the bootstrap preferable to a permutation test?

A. 要嚴格控制 Type I 錯誤率A. When strict Type I control is required
B. 要估計複雜統計量的 SE / CI(沒有自然「可交換標籤」可洗牌)B. When estimating SE / CI for a complex statistic (no natural exchangeable labels to permute)
C. 樣本量極小(n=3)C. With extremely small samples (n=3)
D. 兩者完全可互換D. The two are completely interchangeable

3. Competitive 基因集檢定(如 GSEA、camera)與 self-contained(如 roast)的本質差別?

3. The essential difference between competitive (GSEA, camera) and self-contained (roast) gene-set tests?

A. competitive 只能用排列;self-contained 必須用旋轉A. Competitive must use permutation; self-contained must use rotation
B. competitive 只適用單樣本;self-contained 適用兩樣本B. Competitive only fits one-sample designs; self-contained fits two-sample designs
C. competitive 問「S 中基因比 S 外更顯著嗎?」;self-contained 問「S 中至少一個基因有差異嗎?」C. Competitive asks "are S members more differential than non-S?"; self-contained asks "is any gene in S differentially expressed?"
D. competitive 不能校正基因間相關D. Competitive cannot correct for inter-gene correlation