為什麼基因組規模的篩查必須校正?
假設你檢定 m = 20,000 個基因,全部在虛無假設下,仍會期待 m·α = 1,000 個 p < 0.05 的「顯著」結果。不校正的 p-value 在多重檢定下沒有意義。校正方法分兩條路:(1) FWER(Family-Wise Error Rate)控制「至少一個」假陽性的機率——非常嚴格,適合驗證型研究;(2) FDR(False Discovery Rate)控制「被宣告為顯著之中的假陽性比例」——更貼近發現型研究的需求。
本章介紹三個現代 FDR 標準:(a) Benjamini-Hochberg (BH) 程序——簡單、無分布假設、PRDS 相依下仍然有效;(b) Storey 的 pFDR / q-value——估計 π₀(真的虛無比例),比 BH 更有檢力;(c) Independent Hypothesis Weighting (IHW)——利用「均值表現量」等共變數,把檢力分配給更有信號的子集,仍保有 FDR 控制。
If you test m = 20,000 genes all under the null, you expect m·α = 1,000 "significant" results at p < 0.05. Unadjusted p-values are meaningless at genome scale. Two correction families exist: (1) FWER (Family-Wise Error Rate) controls the probability of any false positive — strict, suits confirmatory work; (2) FDR (False Discovery Rate) controls the proportion of false positives among rejections — better suited to discovery.
This chapter introduces three modern FDR standards: (a) Benjamini-Hochberg (BH) — simple, distribution-free, valid under PRDS dependence; (b) Storey's pFDR / q-value — estimates π₀ (true-null fraction), more powerful than BH; (c) Independent Hypothesis Weighting (IHW) — uses a covariate (e.g. mean expression) to redirect power toward signal-rich subsets while preserving FDR control.
一、FWER vs FDR 與 BH 程序
FWER:Bonferroni / Holm
- Bonferroni:在 α/m 下拒絕——最簡單但最保守。
- Holm:步降程序——比 Bonferroni 嚴格更有檢力,仍控 FWER。
- 適用情境:驗證型研究、安全性報告、臨床終點。
- 代價:在 m=20,000 時幾乎無檢力,會錯失真實效應。
- Bonferroni: reject at α/m — simplest but most conservative.
- Holm: step-down — strictly more powerful than Bonferroni, still FWER-valid.
- Best for: confirmatory work, safety endpoints, clinical primary outcomes.
- Cost: at m=20,000 power collapses; real effects get missed.
FDR:BH / BY / Storey q
- BH (1995):排序 p-value,找最大的 i 使 p₍ᵢ₎ ≤ i·q/m,拒絕前 i 個。
- BY (2001):對任意相依有效——乘以調和級數,較保守。
- Storey q:估計 π₀ 後得 q-value——通常比 BH 多 10–30% 發現。
- local FDR (lfdr):每個特徵的「來自虛無」後驗機率。
- BH (1995): sort p-values; find largest i with p₍ᵢ₎ ≤ i·q/m; reject top i.
- BY (2001): valid under arbitrary dependence (× harmonic factor) — conservative.
- Storey q: estimates π₀ → q-value; typically 10–30% more discoveries than BH.
- local FDR (lfdr): per-feature posterior probability of being null.
IHW:共變數加權
- 用獨立共變數(如基因平均表現量)對檢定加權。
- 有資料的子集 → 更多 α 預算;低表現子集 → 較少預算。
- 共變數需與 p-value 在 H₀ 下獨立——否則破壞 FDR 控制。
- DESeq2 預設用 IHW 替代 BH。
- Weights tests by an independent covariate (e.g. mean expression).
- Signal-rich strata get more α budget; low-expression strata less.
- Covariate must be independent of p-value under H₀ — else FDR breaks.
- DESeq2 ships with IHW as a drop-in replacement for BH.
相依性與 BH 的有效性
- PRDS(正相依)下 BH 仍有效——大多數基因表現、GWAS LD 屬此類。
- 任意相依 → 用 BY,或permutation-based FDR。
- 強相關(基因網絡、長 LD 區塊)會讓 BH 略保守,但不會失效。
- 對極端相依結構,permutation 是黃金標準。
- Under PRDS (positive dependence) BH remains valid — covers most expression / GWAS LD.
- Arbitrary dependence → use BY or permutation-based FDR.
- Strong correlation (networks, long LD blocks) makes BH slightly conservative, not invalid.
- For extreme dependence, permutation is the gold standard.
二、過濾、診斷與生信流程整合
p-value 直方圖是 FDR 最重要的診斷:H₀ 下 p-value 應為 Uniform(0,1),所以直方圖應該平坦——加上 0 附近的尖峰(真實訊號)。如果出現「中段凸起」或「右側上升」,表示你的檢定本身偏離 H₀ 分布(批次效應、隱性共變數、模型錯置),這時做任何 FDR 校正都無效——必須先修檢定。
獨立過濾(independent filtering):在做 FDR 前,移除與 p-value 在 H₀ 下獨立的低訊號特徵(如極低 count 的基因)——這會增加檢力但不破壞 FDR。但若過濾準則基於檢定統計量本身(如先取 |LFC| > 1 才做 FDR),則 FDR 失效。IHW 把過濾的精神原則化:用共變數加權,理論完備。
生信標準流程:RNA-seq DE 用 BH 或 IHW(DESeq2 預設);GWAS 用 5×10⁻⁸ 的 Bonferroni 門檻;MS 蛋白質體(FragPipe / MaxQuant)用 target-decoy FDR;ChIP-seq peak calling 用 MACS2 的 q-value;single-cell DE 用每組獨立的 BH。
The p-value histogram is the single most important FDR diagnostic: under H₀ p is Uniform(0,1), so the histogram should be flat with a spike near 0 (real signal). A bump in the middle or rising right tail means your test itself deviates from its null distribution (batch effects, hidden covariates, misspecified model) — no amount of FDR correction can fix this; fix the test first.
Independent filtering: remove low-signal features whose distribution is independent of the p-value under H₀ (e.g. very-low-count genes) before applying FDR — this boosts power without invalidating FDR. But filtering on the test statistic itself (e.g. first keeping |LFC| > 1 and then computing FDR) breaks FDR. IHW formalises filtering into principled covariate weighting.
Standard bioinfo pipelines: RNA-seq DE uses BH or IHW (DESeq2 default); GWAS uses the 5×10⁻⁸ Bonferroni threshold; MS proteomics (FragPipe / MaxQuant) uses target-decoy FDR; ChIP-seq peak calling uses MACS2 q-values; single-cell DE applies BH within each cluster.
BH 程序視覺化
下方散布圖中,每點是一個基因的 p-value 與排序 i 的對應位置。紅線是 i·q/m——BH 接受所有低於此線的點。拖動滑桿觀察:q 增加,紅線陡升,更多點被接受;π₀ 降低(更多基因是真實訊號),p-value 集中在左下角,BH 與 Storey 的差距變大。
Each dot is a gene at rank i with its p-value. The red line is i·q/m — BH rejects every dot below it. Slide q upward and the threshold rises; lower π₀ (more truly-non-null genes) packs p-values into the lower-left, widening the gap between BH and Storey.
X: 排序 i | Y: p-value | 紅線:BH 閾值 i·q/m
實作:BH / Storey / IHW
# --- R --- 多重檢定校正 # p 是長度 m 的 p-value 向量;res 是 DESeq2 結果 # 1) 基本 BH / Holm / Bonferroni padj_bh <- p.adjust(p, method="BH") # FDR padj_holm <- p.adjust(p, method="holm") # FWER padj_bon <- p.adjust(p, method="bonferroni") sum(padj_bh < 0.05) # 發現數 # 2) Storey q-value(更高檢力) library(qvalue) qv <- qvalue(p) qv$pi0 # 估計的真實虛無比例 sum(qv$qvalues < 0.05) # 3) IHW —— 用共變數加權 (DESeq2 預設) library(IHW); library(DESeq2) ihw_res <- ihw(pvalue ~ baseMean, data=res, alpha=0.05) rejections(ihw_res) plot(ihw_res) # 看各組權重分布 # 4) p-value 直方圖診斷(最重要的 sanity check) hist(p, breaks=50, main="p-value histogram") # 理想:左側尖峰 + 右側平坦; 中段凸起 = 模型有問題
# --- Python --- import numpy as np, matplotlib.pyplot as plt from statsmodels.stats.multitest import multipletests # 1) BH / Holm / Bonferroni rej, padj, _, _ = multipletests(p, alpha=0.05, method="fdr_bh") np.sum(rej) _, padj_holm, _, _ = multipletests(p, method="holm") # 2) Storey π₀ 估計(簡易版) def pi0_storey(p, lam=0.5): return np.mean(p > lam) / (1 - lam) pi0 = pi0_storey(p) # 3) IHW 透過 rpy2 橋接 R(沒有原生 Python 版) from rpy2.robjects import r, FloatVector r.assign("p", FloatVector(p)); r.assign("bm", FloatVector(base_mean)) r('library(IHW); ihw_res <- ihw(p, bm, alpha=0.05)') # 4) p-value 直方圖診斷 plt.hist(p, bins=50); plt.xlabel("p-value"); plt.show() # 平坦 + 左側尖峰 → 模型 OK
📝 自我檢測
1. 對基因組規模篩查(m = 20,000),為什麼 FDR 通常比 FWER 更合理?
1. For a genome-wide screen (m = 20,000), why is FDR generally preferred over FWER?
2. 「這個基因的 q-value = 0.05」最精確的解讀是?
2. The most precise interpretation of "this gene's q-value = 0.05" is:
3. BH 程序在何種相依性下嚴格有效?
3. Under what dependence is the BH procedure strictly valid?