從 James-Stein 到 limma:收縮為何贏
1961 年 Stein 證明了一個令所有頻率學家震驚的結果:當你要同時估計 k ≥ 3 個獨立常態均值,把每個樣本均值朝整體中心收縮得到的估計量,在所有真值組合下、均勻地比 MLE 有更小的均方誤差 (MSE)。MLE 雖然無偏,卻不是最佳——這就是James-Stein paradox。
同樣的邏輯讓 limma::eBayes() 成為微陣列/RNA-seq 差異表達分析的事實標準:每個基因 g 的樣本變異 s_g² 在 n=3 時極不穩定,t 統計量會被小變異基因「人為放大」假陽性;eBayes 將 s_g² 朝整體變異趨勢線收縮,得到 moderated t——分母更穩定、推論更可靠。
這一章把階層模型 (full Bayesian) 與經驗貝氏 (EB) 並列:兩者都靠 partial pooling / borrow strength,只是 EB 用資料估計超先驗參數,不傳播超參數的不確定性。
In 1961 Stein proved a result that shocked frequentists: when jointly estimating k ≥ 3 independent normal means, shrinking each sample mean toward the overall centre uniformly beats the MLE in mean squared error for every true parameter vector. The MLE is unbiased but inadmissible — this is James-Stein paradox.
The same logic makes limma::eBayes() the de-facto standard for microarray / RNA-seq differential expression. With n = 3 per group the per-gene variance s_g² is wildly unstable, and t-statistics get artificially inflated for low-variance genes. eBayes shrinks s_g² toward the global mean-variance trend, producing the moderated t — a stabler denominator and far more reliable inference.
This chapter sets hierarchical (full Bayes) next to empirical Bayes (EB): both partial-pool by borrowing strength; EB plugs in hyperparameters estimated from the data instead of propagating their uncertainty.
一、階層、EB、shrinkage
階層模型 (partial pooling)
每個 g 的參數 θ_g 來自一個共同的母體分布 θ_g ~ N(μ, τ²),其中 μ, τ² 也有自己的先驗(hyperprior)。後驗 p(θ_g|y) 自動把「單獨估計 y_g」與「整體平均 μ̂」做加權平均——這就是 partial pooling。
極限:τ → 0 退化為 complete pooling(所有基因一個值);τ → ∞ 退化為 no pooling(每個基因獨立)。「適度收縮」是兩端之間最優。
Each gene's parameter θ_g is drawn from a common population θ_g ~ N(μ, τ²) with its own hyperprior. The posterior p(θ_g|y) automatically takes a weighted average of "fit gene g alone" and "the overall μ̂" — that is partial pooling.
Limits: τ → 0 → complete pooling (one value for all); τ → ∞ → no pooling (each gene independent). Moderate shrinkage sits in the middle and is provably better.
James-Stein paradox
對 X_i ~ N(θ_i, 1), i=1..k (k ≥ 3),James-Stein 估計量
θ̂_JS = (1 − (k−2)/Σ X_i²) · X
在任何真值組合下都有比 MLE = X 更低的 MSE。違反直覺處:即使各組獨立、看似無關,「同時收縮」仍嚴格更好。
這提供了 limma / DESeq2 shrinkage 的理論基礎——只是把「均值朝中心收縮」改為「變異 / log-FC 朝整體趨勢收縮」。
For X_i ~ N(θ_i, 1), i = 1..k (k ≥ 3), the James-Stein estimator
θ̂_JS = (1 − (k−2)/Σ X_i²) · X
has strictly lower MSE than the MLE for every θ. Counter-intuitive bit: even when the groups are independent and seemingly unrelated, joint shrinkage strictly improves total MSE.
This is the theoretical backbone for limma / DESeq2 shrinkage — just applied to variances or log-FCs instead of means.
limma 的 eBayes() 到底做什麼?
- 對 expression matrix
E[g, sample]做基因內 OLS,得到β̂_g與s_g²。 - EB shrinkage 變異:假設
s_g² ~ s₀² · χ²/d₀(scaled inverse-χ²),從所有基因估出s₀²與超先驗自由度d₀。 - 後驗變異 = 加權平均:
s̃_g² = (d₀ · s₀² + d_g · s_g²) / (d₀ + d_g)。 - moderated t =
β̂_g / (s̃_g · sqrt(v_g)),自由度d₀ + d_g— 比原 t 多出d₀度自由度。 - 結果:n=3 的實驗也能跑「等效於 n=3+d₀」的穩定 t 檢定。
- Per-gene OLS on the expression matrix
E[g, sample]givesβ̂_gands_g². - EB shrinks the variance: assume
s_g² ~ s₀² · χ²/d₀(scaled inverse-χ²); estimates₀²and prior dfd₀from all genes. - Posterior variance = weighted average:
s̃_g² = (d₀ · s₀² + d_g · s_g²) / (d₀ + d_g). - moderated t =
β̂_g / (s̃_g · √v_g), df =d₀ + d_g— i.e. extrad₀degrees of freedom borrowed from the genome. - Net effect: even an n = 3 experiment behaves like "n = 3 + d₀" for the variance estimator.
ashr 與 lfcShrink
- ashr (Adaptive Shrinkage):直接對效應大小 β̂_g 做 EB,先驗用非對稱混合常態,靈活適應稀疏 vs 密集真效應分布。
- DESeq2 的
lfcShrink(type="apeglm"):使用 apeglm 對 NB GLM 的 log-FC 做 EB 收縮,減少低 count 基因的雜訊噴發。 - 產出的 LFC 適合做 effect-size 排序、ranked GSEA。
- ashr (Adaptive SHRinkage): EB on the effect sizes β̂_g directly, with a flexible asymmetric mixture-normal prior — adapts to sparse vs dense true effects.
- DESeq2's
lfcShrink(type="apeglm"): EB shrinks the NB-GLM log-FCs, taming the noisy explosions at low counts. - Shrunken LFCs are ideal for effect-size ranking and ranked-GSEA.
eBayes 變異收縮視覺化
每個圓點是一個基因的 (均值, 原始變異)。拉動 d₀ 滑桿增加先驗自由度,觀察每個 s_g² 被拉向整體中心 s₀²;連動下方的 t-統計量散布——收縮後 t-stat 不再被低變異基因人為放大。
Each dot is a gene at (mean, raw variance). Push the d₀ slider to inject more prior df and watch every s_g² migrate toward the global s₀². The t-statistic panel updates in real time — moderated t no longer explodes at low-variance genes.
X: 原始 log(s_g²);Y: 收縮後 log(s̃_g²)
二、EB 還是 full Bayes?
🌳 收縮策略決策樹
lfcShrink(type="apeglm") 或 ashr。lfcShrink(type="apeglm") or ashr.| 場景 | 方法 | 收縮什麼 | |||
|---|---|---|---|---|---|
| 微陣列 DE (n=3 vs 3) | limma::eBayes(trend=TRUE) | 基因內變異 s_g² | Microarray DE (n=3 vs 3) | limma::eBayes(trend=TRUE) | per-gene variance s_g² |
| RNA-seq DE,需要顯示 LFC | DESeq2 + lfcShrink(apeglm) | log fold-change β̂_g | RNA-seq DE with LFC display | DESeq2 + lfcShrink(apeglm) | log fold-change β̂_g |
| 非對稱效應分布 (sparse hits) | ashr::ash() | β̂_g 與 lfsr (false sign rate) | Asymmetric / sparse effects | ashr::ash() | β̂_g and lfsr |
| 多病人 / 多組織階層 | brms / Stan hierarchical | θ_g 朝亞群均值收縮 | Multi-donor / multi-tissue | brms / Stan hierarchical | θ_g toward subgroup means |
| 少基因 + 臨床決策 | full Bayes + 弱先驗 | 完整 hyperparameter UQ | Few genes + clinical decision | Full Bayes + weak priors | full hyperparameter UQ |
實作:limma eBayes、ashr、PyMC 階層
# --- R --- 經典 limma + ashr library(limma); library(ashr) # 1) 標準 limma 流程 design <- model.matrix(~ group, data=metadata) fit <- lmFit(expr, design) # expr: log2 expression matrix contr <- makeContrasts(groupB - groupA, levels=design) fit2 <- contrasts.fit(fit, contr) fit2 <- eBayes(fit2, trend=TRUE) # 平均-變異 trended prior topTable(fit2, coef=1, number=20, adjust="BH") # 2) 看看 eBayes 學到的超參數 fit2$df.prior # d_0 — 從基因組借來的「額外」自由度 fit2$s2.prior # s_0^2 — 收縮的目標 plotSA(fit2) # 平均 vs 變異 + trend # 3) 直接對效應大小做 ashr ash_out <- ash(fit2$coefficients[, 1], fit2$stdev.unscaled[, 1] * sqrt(fit2$s2.post)) head(ash_out$result[, c("PosteriorMean", "lfsr")]) # 4) DESeq2 對應:lfcShrink # dds <- DESeqDataSetFromMatrix(...); dds <- DESeq(dds) # res <- lfcShrink(dds, coef="group_B_vs_A", type="apeglm")
# --- Python --- import numpy as np, statsmodels.api as sm import pymc as pm, arviz as az # 1) 手動 EB 變異收縮 (簡化版 limma) # expr: (G genes, N samples), design: (N, p) G, N = expr.shape; p = design.shape[1] betas, s2 = np.zeros((G, p)), np.zeros(G) for g in range(G): res = sm.OLS(expr[g], design).fit() betas[g], s2[g] = res.params, res.mse_resid d_g = N - p # 估計 d_0, s_0^2 via method-of-moments on log(s2) (Smyth 2004) z = np.log(s2) e = z - np.log(d_g) + np.special.digamma(d_g/2) # adjusted moments e_mean = e.mean(); e_var = e.var(ddof=1) # solve for d_0 from trigamma(d_0/2) = e_var - trigamma(d_g/2) from scipy.special import polygamma from scipy.optimize import brentq target = e_var - polygamma(1, d_g/2) d0 = brentq(lambda d: polygamma(1, d/2) - target, 0.01, 1000) if target > 0 else np.inf s0_2 = np.exp(e_mean + polygamma(0, d0/2) - np.log(d0)) # shrunken variance s2_post = (d0 * s0_2 + d_g * s2) / (d0 + d_g) # 2) 階層 PyMC:θ_g | μ, τ ~ Normal(μ, τ) with pm.Model() as hb: mu = pm.Normal("mu", 0, 2) tau = pm.HalfNormal("tau", 1) z = pm.Normal("z", 0, 1, shape=G) theta= pm.Deterministic("theta", mu + tau*z) sigma= pm.HalfNormal("sigma", 1) pm.Normal("y", mu=theta[gene_idx], sigma=sigma, observed=y_obs) idata = pm.sample(target_accept=0.95) # 3) ashr:透過 rpy2 # from rpy2.robjects.packages import importr; ashr = importr("ashr") # res = ashr.ash(betas, ses)
t.test + BH on an n=2 vs 2 study. The raw s_g has only 1 df and can be arbitrarily small, inflating t and producing massive false positives. Always use limma + eBayes (or DESeq2 / edgeR) — this is field consensus.
📝 自我檢測
1. 既然每個基因的 MLE 無偏,為什麼收縮估計量在 MSE 上仍然更好?
1. Each gene's MLE is unbiased — why does shrinkage still improve total MSE?
2. limma 的 eBayes() 實際上收縮什麼?這帶來什麼好處?
2. What does limma's eBayes() actually shrink, and what does that buy you?
3. 什麼時候 EB 不夠、需要完整階層貝氏?
3. When is empirical Bayes inadequate compared with full hierarchical Bayes?