為什麼需要校正?
單一檢定下 α = 0.05 的意思是:「如果 H₀ 真的成立,我有 5% 機率犯第一型錯誤」。這在「一篇論文一個假設」的年代沒問題。但當代生物學一次測整個基因組:RNA-seq 大約 m ≈ 20,000 個基因、GWAS 大約 m ≈ 10⁶ 個 SNP、proteomics 大約 m ≈ 5,000 個 peptide。
假設所有 H₀ 都成立(沒有任何基因真的差異表現)。在 α = 0.05 下,預期偽陽性數 = 0.05 × m:m = 20 時 1 個、m = 1,000 時 50 個、m = 20,000 時 1,000 個。一篇宣稱「我們找到 1,200 個 DEG, p < 0.05」的論文,其中 1,000 個可能根本是噪音。
更精確地說,「至少 1 個偽陽性」的機率叫 FWER(Family-Wise Error Rate)。獨立檢定下 FWER = 1 − (1 − α)^m。m = 14 時 FWER 已經 ≈ 51%;m = 100 時 ≈ 99.4%。多重檢定校正就是把這個失控的錯誤率拉回可控範圍。
For a single test, α = 0.05 says: "If H₀ is true, I have a 5% chance of a Type I error." That worked fine in the one-hypothesis-per-paper era. Modern biology measures the whole genome at once: RNA-seq has m ≈ 20,000 genes, GWAS m ≈ 10⁶ SNPs, proteomics m ≈ 5,000 peptides.
Suppose all H₀ are true (no gene is truly differential). At α = 0.05, the expected number of false positives is 0.05 × m: 1 for m = 20, 50 for m = 1,000, and 1,000 for m = 20,000. A paper claiming "we found 1,200 DEGs at p < 0.05" may well be reporting 1,000 noise hits.
The probability of at least one false positive is the FWER (Family-Wise Error Rate). Under independence, FWER = 1 − (1 − α)^m. By m = 14, FWER already ≈ 51%; by m = 100, ≈ 99.4%. Multiple-testing correction drags this runaway error rate back into something controllable.
一、FWER vs FDR
多重檢定有兩個競爭目標。FWER 控制「至少一個錯」的機率——適合臨床試驗、確認性研究,但對 m 大時極度保守。FDR(Benjamini & Hochberg 1995)改控制「宣稱為顯著的之中,有多少比例是錯」——犧牲嚴格性換來在 m 大時的可用 power,是高通量資料的事實標準。
Multiple testing has two competing objectives. FWER controls the chance of any false positive — appropriate for clinical trials and confirmatory work, but ruthlessly conservative when m is large. FDR (Benjamini & Hochberg 1995) instead controls the proportion of false positives among declared discoveries — trading strict guarantees for usable power at high m. It is the de-facto standard in high-throughput biology.
| 概念 | 定義 | 保證什麼 | 適用情境 | |||
|---|---|---|---|---|---|---|
| FWER | P(V ≥ 1),V = 偽陽性數 | 整批分析中「連 1 個錯都不要」的機率 ≤ α | 少量檢定、確認性試驗、Phase III 臨床試驗、藥物 multiple endpoint | P(V ≥ 1), V = # false positives | Chance of any false positive in the family ≤ α | Few tests, confirmatory trials, Phase III, regulatory multiple endpoints |
| FDR | E[V / R],R = 宣稱顯著數(V/R = 0 當 R = 0) | 在被「宣稱顯著」的清單中,偽陽性的期望比例 ≤ q | RNA-seq DEG、GWAS、proteomics、scRNA-seq、screen——所有 m > 100 的探索性分析 | E[V / R], R = # rejections (V/R := 0 if R = 0) | Expected fraction of false positives among declared discoveries ≤ q | RNA-seq DE, GWAS, proteomics, scRNA-seq, screens — any exploratory analysis with m > 100 |
| pFDR | E[V/R | R > 0](Storey 2002) | 條件版本,給予 q-value 的精確語意 | 當「至少有一個顯著」幾乎必然時(高訊號資料) | E[V/R | R > 0] (Storey 2002) | Conditional version, defining the q-value | When ≥ 1 discovery is essentially certain (high-signal data) |
| local FDR | P(H₀ true | T = t)(Efron 2008) | 給定觀察到的統計量值,個別檢定為偽陽性的後驗機率 | empirical Bayes、需要逐 gene 的偽陽性風險 | P(H₀ true | T = t) (Efron 2008) | Posterior probability of false positive for an individual test | Empirical Bayes; when you need per-gene false-positive risk |
m 越大,問題越大
拖動 m 滑桿(1 → 20,000)。觀察三個量:(1) 未校正的 FWER(至少一個偽陽性的機率,獨立時 = 1 − 0.95^m,m 大時極快趨近 1);(2) 預期偽陽性數 = 0.05 × m(線性增長);(3) Bonferroni 校正後個別檢定的 α' = 0.05/m。試試 m = 14:FWER 已過 50%;m = 20,000:α' = 2.5 × 10⁻⁶——這就是 GWAS 用 5×10⁻⁸ 的 Bonferroni 邏輯。
Drag the m slider (1 → 20,000). Watch three quantities: (1) uncorrected FWER (probability of ≥1 false positive; under independence = 1 − 0.95^m, blows up fast); (2) expected false positives = 0.05 × m (linear); (3) per-test Bonferroni threshold α' = 0.05/m. Try m = 14: FWER already > 50%. m = 20,000: α' = 2.5 × 10⁻⁶ — the same logic behind GWAS's 5×10⁻⁸ Bonferroni cut-off.
三條曲線:未校正 FWER(橘)· 預期 FP 數(沙)· Bonferroni 後每檢定 α'(綠)Three curves: uncorrected FWER (orange) · expected FP (sand) · Bonferroni per-test α' (teal)
二、方法家譜
FWER 派從 Bonferroni(1936)走到 Holm(1979)和 Hochberg(1988),追求嚴格的「至少 1 個錯」控制。FDR 派則從 BH(1995)出發,衍生出 BY(任意相依)、Storey q-value(自適應 π₀)、local FDR(Efron empirical Bayes)、IHW(共變量加權)、2dGBH(結構化資料)。
The FWER tradition runs Bonferroni (1936) → Holm (1979) → Hochberg (1988), insisting on strict "no false positives at all" control. The FDR tradition starts at BH (1995) and branches into BY (arbitrary dependence), Storey q-value (adapts to π₀), local FDR (Efron empirical Bayes), IHW (covariate-weighted), and 2dGBH (structured data).
Bonferroni
拒絕 H₀ᵢ 若 pᵢ < α/m。控制 FWER 在任何相依結構下(這是它的賣點與包袱)。缺點:m 大時極度保守,幾乎丟掉所有 power。
Carlo E. Bonferroni 在 1936 年提出,奠基於 Bonferroni 不等式 P(∪ Aᵢ) ≤ Σ P(Aᵢ)。GWAS 的 5×10⁻⁸ 門檻 = 0.05/10⁶。
Reject H₀ᵢ if pᵢ < α/m. Controls FWER under any dependence (its selling point and its burden). Downside: brutal at large m — drops nearly all power.
Carlo E. Bonferroni proposed it in 1936, leaning on P(∪ Aᵢ) ≤ Σ P(Aᵢ). The 5×10⁻⁸ GWAS threshold is just 0.05 / 10⁶.
Holm step-down
把 p 值由小到大排序 p₍₁₎ ≤ p₍₂₎ ≤ ⋯;從最小開始檢視 p₍ᵢ₎ ≤ α / (m − i + 1),若不滿足則停止,後面全部接受。仍控制 FWER、永遠至少和 Bonferroni 一樣強(uniformly more powerful)——沒有理由用 Bonferroni 而不用 Holm。
Sort p-values smallest first p₍₁₎ ≤ p₍₂₎ ≤ ⋯; starting from the smallest, reject as long as p₍ᵢ₎ ≤ α / (m − i + 1); stop on first failure. Still controls FWER, but is uniformly more powerful than Bonferroni — there is no reason to use Bonferroni instead of Holm.
Hochberg step-up
反向:從最大 p 往小檢視,找最大的 i 使 p₍ᵢ₎ ≤ α / (m − i + 1),拒絕所有 ≤ i 的假設。比 Holm 略 strong,但需要獨立或 PRDS 假設(Bonferroni / Holm 不需要)。
Reverse direction: scan from the largest p downward, find the largest i with p₍ᵢ₎ ≤ α / (m − i + 1), reject all j ≤ i. Slightly more powerful than Holm but requires independence or PRDS (Bonferroni and Holm do not).
BH-FDR ⭐
排序 p₍₁₎ ≤ ⋯ ≤ p₍ₘ₎,找最大的 k 滿足 p₍ₖ₎ ≤ (k/m)·q,拒絕所有 i ≤ k。在獨立或 PRDS(Positive Regression Dependence on Subsets)下控制 FDR ≤ q。大多數基因表達資料、GWAS 同一染色體 SNP 都滿足 PRDS。高通量資料的工業標準。
原始論文 JRSS-B 57:289–300,DOI 10.1111/j.2517-6161.1995.tb02031.x。
Sort p₍₁₎ ≤ ⋯ ≤ p₍ₘ₎; find the largest k with p₍ₖ₎ ≤ (k/m)·q; reject all i ≤ k. Controls FDR ≤ q under independence or PRDS (Positive Regression Dependence on Subsets), satisfied by most gene-expression tests and same-chromosome GWAS SNPs. The industry standard for high-throughput data.
JRSS-B 57:289–300, DOI 10.1111/j.2517-6161.1995.tb02031.x.
BY for arbitrary dependence
當相依結構未知或負相依時,BH 不保證控制 FDR。BY 把 BH 的門檻乘以 1/Σᵢ₌₁ᵐ (1/i) ≈ 1/(ln m + 0.5772)。m = 10,000 時這倍率約 1/9.8——非常保守,但任何相依結構都成立。Ann. Statist. 29:1165–1188。
When dependence is unknown or negative, BH may not control FDR. BY multiplies BH's threshold by 1/Σᵢ₌₁ᵐ (1/i) ≈ 1/(ln m + 0.5772). For m = 10,000 that's roughly 1/9.8 — very conservative, but valid under any dependence. Ann. Statist. 29:1165–1188.
Storey q-value ⭐
估計 π₀(真 H₀ 比例),用 π₀·m 取代 BH 的 m。如果只有 60% 基因為真 null,q-value 比 BH power 高 ~40%。q-value(i) = 「使第 i 個檢定恰好被宣稱顯著時的最小 pFDR」——具有與 p-value 對偶的逐項解讀。
qvalue R 套件(Storey, Bass, Dabney, Robinson);scRNA-seq DE 的主力工具。Storey 2002 JRSS-B 64:479; Storey & Tibshirani 2003 PNAS 100:9440。
Estimates π₀ (the true-null proportion) and replaces m by π₀·m in BH. If only 60% of genes are truly null, q-value yields ~40% more power than BH. q-value(i) = "the smallest pFDR at which test i would just be rejected" — a per-test reading dual to the p-value.
The qvalue R package (Storey, Bass, Dabney, Robinson) is the scRNA-seq DE workhorse. Storey 2002 JRSS-B 64:479; Storey & Tibshirani 2003 PNAS 100:9440.
Local FDR
fdr(z) = P(H₀ | T = z) = π₀·f₀(z) / f(z),empirical-Bayes 框架。q-value 是「尾端偽陽性比例」,local FDR 是「該點偽陽性機率」。Efron 2007 JASA、2008 Stat Sci、2010 書《Large-Scale Inference》系統論述。R: locfdr。
fdr(z) = P(H₀ | T = z) = π₀·f₀(z) / f(z), an empirical-Bayes formulation. The q-value gives the tail false-positive rate; local FDR gives the pointwise posterior. See Efron 2007 JASA, 2008 Stat Sci, and the 2010 book Large-Scale Inference. R: locfdr.
IHW — covariate-weighted
許多檢定有共變量(covariate)和 power 相關但和 p-value 在 H₀ 下獨立——例如 RNA-seq DE 中的 mean expression、GWAS 中的 MAF。IHW(Independent Hypothesis Weighting)按 covariate 分箱、學習每箱權重,在控制 FDR 下大幅提升 power。Nat Methods 13:577,DOI 10.1038/nmeth.3885。Bioconductor: IHW。
Many tests have a covariate correlated with power but independent of the p-value under H₀ — mean expression in RNA-seq DE, MAF in GWAS. IHW (Independent Hypothesis Weighting) bins by the covariate, learns per-bin weights, and gains substantial power while still controlling FDR. Nat Methods 13:577, DOI 10.1038/nmeth.3885. Bioconductor: IHW.
2dGBH — structured / spatial
當檢定有二維結構(如空間轉錄組的 spot × gene、scATAC 的 peak × cluster)時,傳統一維 BH 忽略此結構。2dGBH 在二維 group 上做 group BH,提升 power 同時控制 FDR。Liu 2024 提出。
When tests share a two-dimensional structure (spatial transcriptomics spot × gene, scATAC peak × cluster, etc.), 1D BH ignores it. 2dGBH applies group BH across the 2D grouping, raising power while keeping FDR control. Proposed in Liu 2024.
BH-FDR demonstrator
模擬 m 個 p-value:比例 π₀ 為真 null(U(0,1))、其餘為真 alt(偏小 p)。排序後畫散佈圖 (i, p₍ᵢ₎),疊上 BH 線 y = q·i/m。BH 規則:找最大的 i 使點落在線下,拒絕所有 ≤ 此 i 的假設。拖滑桿改 π₀、q、m,觀察拒絕門檻怎麼移動——π₀ 越小(真訊號越多),步距越長,BH 也越強。
Simulate m p-values: a fraction π₀ are true nulls (U(0,1)), the rest are true alternatives (small p). Sort and scatter (i, p₍ᵢ₎), overlay the BH line y = q·i/m. BH rule: find the largest i with the point under the line; reject all j ≤ i. Drag the sliders for π₀, q, m and watch the rejection threshold slide — the smaller π₀ is (more real signal), the further BH walks.
綠 = 拒絕(≤ k)· 灰 = 接受 · 紅虛線 = BH 門檻 y = q·i/m · 藍虛線 = Bonferroni α/mGreen = rejected (≤ k) · grey = accepted · red dashed = BH line y = q·i/m · blue dashed = Bonferroni α/m
三、決策樹
🌳 多重檢定方法選擇
四、方法比較表
| 方法 | 控制什麼 | 假設 | Power | 適用情境 | |||
|---|---|---|---|---|---|---|---|
| Bonferroni 1936 | FWER | 任意相依 | 最低 | 少量檢定、GWAS 5×10⁻⁸ | Any dependence | Lowest | Few tests, GWAS 5×10⁻⁸ |
| Holm 1979 | FWER | 任意相依 | 略高於 Bonferroni(始終 ≥) | 取代 Bonferroni 的預設選項 | Any dependence | ≥ Bonferroni, always | Default replacement for Bonferroni |
| Hochberg 1988 | FWER | 獨立 / PRDS | ≥ Holm | 少量檢定、相依結構良好 | Independence / PRDS | ≥ Holm | Few tests with mild dependence |
| BH 1995 ⭐ | FDR ≤ q | 獨立 / PRDS | 高 | RNA-seq, proteomics, microbiome — 預設 | Independence / PRDS | High | RNA-seq, proteomics, microbiome — default |
| BY 2001 | FDR ≤ q | 任意相依 | ≈ BH × 1/(ln m + 0.58) | 負相依或未知結構 | Any dependence | ≈ BH × 1/(ln m + 0.58) | Negative or unknown dependence |
| Storey q-value | FDR / pFDR ≤ q | 獨立 / 弱相依、估 π₀ 可靠 | ≥ BH(π₀ < 1 時更高) | scRNA-seq DE、microarray、π₀ 顯著 < 1 的場景 | Independence / weak dep., reliable π₀ | ≥ BH (more so when π₀ < 1) | scRNA-seq DE, microarray, π₀ < 1 regimes |
| IHW 2016 | FDR ≤ q | 獨立 covariate(H₀ 下 p ⊥ covariate) | > BH(covariate 資訊量大時顯著) | DESeq2 預設、ATAC peak filtering | Independent covariate (p ⊥ covariate under H₀) | > BH (large gain when covariate informative) | DESeq2 default, ATAC peak filtering |
| local FDR | per-test 後驗 | 大 m(> 500)、可估 f, f₀ | 不直接比較(不同 metric) | empirical Bayes、需要逐 gene 風險 | Large m (> 500), estimable f, f₀ | Different metric, not directly comparable | Empirical Bayes, per-gene risk |
五、組學慣例
RNA-seq DE
m ≈ 15,000–25,000 基因。預設:BH 或 BH + IHW(DESeq2 預設)。apeglm shrinkage 後配 BH。報告 padj < 0.05 或 0.1。
m ≈ 15,000–25,000 genes. Default: BH or BH + IHW (DESeq2 default). apeglm shrinkage then BH. Report padj < 0.05 or 0.1.
GWAS
m ≈ 10⁶ SNP。歷史共識:Bonferroni α/m = 5×10⁻⁸("genome-wide significance")。同一染色體 LD 區塊滿足 PRDS,可用 BH 但傳統仍偏好 Bonferroni 以利跨研究 meta。
m ≈ 10⁶ SNPs. Historical consensus: Bonferroni α/m = 5×10⁻⁸ ("genome-wide significance"). Same-chromosome LD blocks satisfy PRDS so BH is valid, but Bonferroni remains preferred for cross-study meta-analysis.
Proteomics
m ≈ 5,000–10,000 peptide/protein。BH 在 PSM、peptide、protein 三層分別校正。MaxLFQ 後常用 limma + BH。
m ≈ 5,000–10,000 peptides / proteins. BH applied separately at PSM, peptide, protein levels. Post-MaxLFQ, the limma + BH pipeline is standard.
scRNA-seq DE
cluster-vs-rest 每 cluster 數萬個比較。Seurat FindMarkers 預設 BH;Storey q-value 在 ASAP、MAST 中常見。注意 pseudobulk vs single-cell-level 的選擇影響 m 與 FDR 估計。
cluster-vs-rest yields tens of thousands of comparisons per cluster. Seurat FindMarkers defaults to BH; Storey q-value appears in ASAP / MAST. Note: pseudobulk vs single-cell-level analysis changes both m and the FDR estimate.
ChIP / ATAC
MACS2 peak calling 用 q-value(BH-FDR)作為 cutoff。peak × condition 差異分析則用 DESeq2 + BH,或 DiffBind + IHW(mean signal 作 covariate)。
MACS2 peak calling uses a q-value (BH-FDR) cutoff. Differential peak × condition is DESeq2 + BH or DiffBind + IHW (mean signal as covariate).
Microbiome
m ≈ 100–2,000 OTU/ASV,相依強(compositionality)。ANCOM-BC + BH 或 BY 對任意相依更穩健。比例壓制下假陽性極多——多重檢定校正只是底線,先做 compositional transform。
m ≈ 100–2,000 OTU/ASV with strong dependence (compositionality). ANCOM-BC + BH or BY for arbitrary dependence. False positives are rampant under compositional constraint — correction is the floor; transform first.
六、實作範例
# R — Multiple-testing toolbox library(qvalue) # Storey q-value library(IHW) # covariate-weighted FDR library(locfdr) # Efron local FDR # Suppose p_vec is a length-m vector of p-values m <- length(p_vec) # --- FWER family --- p_bonf <- p.adjust(p_vec, method = "bonferroni") p_holm <- p.adjust(p_vec, method = "holm") p_hoch <- p.adjust(p_vec, method = "hochberg") # --- FDR family --- p_bh <- p.adjust(p_vec, method = "BH") # Benjamini-Hochberg ⭐ p_by <- p.adjust(p_vec, method = "BY") # arbitrary dependence # Storey q-value (adapts to π₀) qv <- qvalue::qvalue(p_vec) qv$pi0; qv$qvalues summary(qv) # reports π₀ and number of discoveries by q # IHW with a covariate (e.g. mean expression for RNA-seq DE) res <- IHW::ihw(p_vec ~ mean_expr, alpha = 0.05) rejections(res) adj_p <- IHW::adj_pvalues(res) # Local FDR (Efron empirical Bayes; needs z-scores, not p) z <- qnorm(1 - p_vec) lf <- locfdr::locfdr(z, plot = 0) lf$fdr # per-test local FDR # --- DESeq2 pipeline (the modern default) --- # library(DESeq2); res <- results(dds, filterFun = ihw); res$padj
import numpy as np from statsmodels.stats.multitest import multipletests from scipy.stats import norm # --- FWER family --- _, p_bonf, _, _ = multipletests(p_vec, method='bonferroni') _, p_holm, _, _ = multipletests(p_vec, method='holm') _, p_hoch, _, _ = multipletests(p_vec, method='hommel') # Hommel ~ Hochberg variant # --- FDR family --- rej, p_bh, _, _ = multipletests(p_vec, method='fdr_bh', alpha=0.05) _, p_by, _, _ = multipletests(p_vec, method='fdr_by', alpha=0.05) _, p_tsbh, _, _ = multipletests(p_vec, method='fdr_tsbh', alpha=0.05) # two-stage # --- Storey q-value (requires the qvalue package on PyPI) --- from qvalue import estimate result = estimate(p_vec) # returns qvalues + pi0 # --- IHW in Python: via rpy2 or pyihw (Bioconductor port) --- from pyihw import ihw adj_p, weights = ihw(p_vec, mean_expr, alpha=0.05) # --- Local FDR: convert to z, fit two-group empirical Bayes --- z = norm.isf(p_vec) # custom fit or use sklearn / locfdr-python # --- DESeq2 (via pyDESeq2) pipeline --- # from pydeseq2.dds import DeseqDataSet # res = dds.results(); res['padj'] # already BH
p.adjust(p, method="BH") 與 statsmodels 的 multipletests(p, method='fdr_bh') 是 99% 的場合會用到的;其餘是 IHW(power 增益最顯著)與 q-value(π₀ 估計)。寫論文 Methods:「P-values were adjusted using the Benjamini-Hochberg procedure (BH; Benjamini and Hochberg, 1995). Significant if BH-adjusted p < 0.05.」
Practical cheatsheet: R's p.adjust(p, method="BH") and statsmodels' multipletests(p, method='fdr_bh') cover 99% of cases. The rest are IHW (biggest power gain) and q-value (π₀ estimation). Methods boilerplate: "P-values were adjusted using the Benjamini-Hochberg procedure (BH; Benjamini and Hochberg, 1995). Discoveries were declared at BH-adjusted p < 0.05."
七、常見陷阱
❌ BH 在負相依下不保證
BH 保證 FDR ≤ q 需要獨立或 PRDS。當資料負相依(例如 compositional microbiome、強相關 features),BH 可能失效。實務做法:改用 BY(永遠保證但更保守)或先做 decorrelation / compositional transform。Benjamini & Yekutieli 2001 給的反例至今仍是教科書。
BH guarantees FDR ≤ q only under independence or PRDS. With negative dependence (compositional microbiome, strongly correlated features), BH may fail. Fix: use BY (always valid, more conservative) or decorrelate / compositionally transform first. Benjamini & Yekutieli 2001 give the canonical counter-example.
❌ raw p 不該是主要結果
m 很大時,raw p < 0.05 幾乎沒有資訊。論文常見錯誤:「我們找到 1,200 個基因 p < 0.05(其中 80 個 padj < 0.05)」——80 才是結論,1,200 是噪音。主要結果一定是 padj / q,raw p 只能輔助。
For large m, raw p < 0.05 carries almost no information. The classic paper error: "We found 1,200 genes at p < 0.05 (80 at padj < 0.05)" — 80 is the result; 1,200 is noise. The headline must be padj / q-value; raw p is supplementary at best.
❌ 事後挑 q 門檻
「q < 0.05 沒幾個顯著,那我用 q < 0.10⋯⋯還是不夠,q < 0.20 試試」——這是嚴重 p-hacking。q 必須在看資料前選定。若需要寬鬆 cutoff,預先註明(如 q < 0.10 作 hypothesis generation, q < 0.05 作 confirmation)並 pre-register。
"q < 0.05 gives nothing... try q < 0.10... still few... q < 0.20?" — that's serious p-hacking. The q threshold must be chosen before looking at the data. If a generous cutoff is needed, state it (e.g., q < 0.10 for hypothesis generation, q < 0.05 for confirmation) and pre-register it.
❌ 小 m 估 FDR 不可靠
FDR 的良好行為來自大 m 的大數法則。m < 20 時,FDR 估計的變異性大、π₀ 幾乎沒法估、q-value 不穩。建議 m < 20 用 Holm;m = 20–100 用 BH 但解讀保守;m > 100 才放心用 q-value / IHW。
FDR's good behaviour comes from the law of large numbers on m. With m < 20, FDR estimates are highly variable, π₀ is essentially unestimable, and q-values wobble. Recommendation: m < 20 → Holm; m = 20–100 → BH but interpret cautiously; m > 100 → q-value / IHW are reliable.
❌ FDR 不是 p
「padj < 0.05」不代表「該基因為 null 的機率 < 5%」。padj < 0.05 表示:在所有 padj < 0.05 的基因中,平均 5% 是錯的——這是清單層級而非單一基因層級的保證。要單一層級用 local FDR 或 Bayes posterior。
"padj < 0.05" does not mean "this gene has < 5% chance of being null." It means: among all genes with padj < 0.05, an average of 5% are false — a list-level, not per-gene, guarantee. For per-gene risk, use local FDR or Bayesian posterior probabilities.
❌ 同一檢定校正兩次
常見錯誤:先在 cluster 內做 BH,再對所有 cluster 結果再做一次 BH。這會雙重收縮,過度保守。正確做法是一次性對整批檢定(或 grouped FDR 方法如 2dGBH)做校正。
Common mistake: BH within each cluster, then BH again across clusters. This double-shrinks and is over-conservative. Apply correction once to the full family, or use a grouped-FDR method (2dGBH) that handles structure explicitly.
❌ 獨立 filter 才不算多重性
RNA-seq 常先用 mean count 過濾低表達基因再做 DE。filter 必須與後續檢定獨立(如用 row-mean 過濾、不用 group difference 過濾),否則 FDR 失效。這正是 IHW(Ignatiadis 2016)的數學基礎——它允許 filter,前提是 filter 與 H₀ 下 p 獨立。
RNA-seq often pre-filters low-count genes by mean. The filter must be independent of the downstream test (row-mean OK; group-difference filter not OK), or FDR control breaks. This is the mathematical basis of IHW (Ignatiadis 2016) — it permits filters as long as they are independent of p under H₀.
❌ 事後挑校正法
「Bonferroni 都沒顯著,那我用 BH」——分析前決定。Methods 段預先寫明方法是 reproducibility 的基本要求。換方法等於增加 type-I error 的搜索空間。
"Bonferroni found nothing — try BH" inflates Type-I error by silently expanding the search space. Decide before analysing; pre-specify in Methods. This is reproducibility 101.
📝 自我檢測
1. 你做 RNA-seq DE,m = 18,000 個基因,要報告主要結果。最合適的多重檢定校正?
1. You run RNA-seq DE on m = 18,000 genes and need a headline result. Best correction?
2. 「padj < 0.05」的正確解讀是?
2. What does "padj < 0.05" actually mean?
3. m = 20,000 個 RNA-seq 檢定,假設所有 H₀ 為真,在未校正 α = 0.05 下預期偽陽性數約是?
3. m = 20,000 RNA-seq tests, all H₀ true, uncorrected α = 0.05. Expected false positives?
4. 你的代謝體學資料 features 之間強相關(負相依),BH 校正可能失效。最穩健的選擇?
4. Your metabolomics features are strongly (negatively) correlated; BH may fail. Most robust choice?