STEP 13 / 13

混合效應模型 (Mixed-Effects Models)

生物 vs 技術重複是一個統計問題——隨機效應、ICC、lme4,避免 pseudoreplication 的最後一道防線。

"Biological vs technical replicate" is a statistical question — random effects, ICC, lme4, and the last line of defence against pseudoreplication.

為什麼「生物 vs 技術重複」是一個統計問題?

實驗室白板上常寫著「n = 60」。但仔細追問:是 60 隻獨立小鼠?還是 3 隻小鼠 × 20 顆細胞?這兩者在統計上不是同一件事。前者的自由度接近 60;後者真正獨立的單位只有 3——做 t 檢定假裝 n = 60,就是 Hurlbert (1984) Ecological Monographs 54:187 所定義的 pseudoreplication(假性重複)

Hurlbert 1984 那篇論文檢視 156 篇生態學論文,發現近一半存在 pseudoreplication;後續神經科學(Lazic 2010 BMC Neurosci 11:5;Aarts et al. 2014 Nat Neurosci 17:491)、細胞生物學(Lord et al. 2020 J Cell Biol 219:e202001064 SuperPlots)也重複驗證了同一個問題。混合效應模型(mixed-effects models)就是處理這類「巢狀/群聚(nested / clustered)」資料的標準工具:透過 random effect 把 cluster 結構顯式建模,讓 SE 與 p 值反映真正的資訊量。

The lab whiteboard says "n = 60". Ask twice: sixty independent mice, or 3 mice × 20 cells? Statistically these are not the same thing. The first has roughly 60 degrees of freedom; the second has only 3 truly independent units — running a t-test as if n = 60 is the textbook case of pseudoreplication, the term coined by Hurlbert (1984) Ecological Monographs 54:187.

Hurlbert reviewed 156 ecology papers and found nearly half contained pseudoreplication; the same problem has since been documented in neuroscience (Lazic 2010 BMC Neurosci 11:5; Aarts et al. 2014 Nat Neurosci 17:491) and cell biology (Lord et al. 2020 J Cell Biol 219:e202001064, SuperPlots). Mixed-effects models are the standard tool for nested / clustered data: they make the cluster structure explicit through random effects, so SEs and p-values reflect the actual amount of independent information.

💡
核心訊息:當你的資料有「結構」——同一個病人多次測量、同一隻小鼠多顆神經元、同一片組織多個視野、同一塊培養盤多孔位、同一家醫院多病人——觀察值不獨立,必須用混合模型。OLS/普通 t 檢定是「單層模型」,它假設每個觀察值都是 i.i.d.,但生物學幾乎從不滿足這個假設。 The headline: when your data have structure — repeated measures on the same patient, many neurons per mouse, many fields per slice, many wells per plate, many patients per hospital — the observations are not independent, and you need a mixed model. OLS / a vanilla t-test is a "flat" model that assumes i.i.d. observations; biology almost never delivers i.i.d.

一、Fixed vs Random、Intercept vs Slope、ICC

混合模型由三件事組成:固定效應(fixed effects)是你關心、想要做 inference 的參數;隨機效應(random effects)是分群的變異成分,你假設群組是來自某個更大母體的「樣本」;以及殘差變異(residual)。模型形式:y_ij = X_ij β + Z_ij b_i + ε_ij,其中 b_i ~ N(0, G)、ε_ij ~ N(0, σ²)。

A mixed model has three ingredients: fixed effects — the parameters you care about and want to infer; random effects — variance components for grouping variables (you treat the groups as a sample from a wider population); and the residual. The form: y_ij = X_ij β + Z_ij b_i + ε_ij, with b_i ~ N(0, G) and ε_ij ~ N(0, σ²).

🎯

Fixed Effects

關心的效應,所有 level 都觀察到、想要做點估計與假設檢定。
例:treatment(藥 vs 安慰劑)、timesexgenotype
類比:你想知道「這個藥對所有人是否有效」。

The effects you care about — all levels observed, want a point estimate plus a hypothesis test.
e.g. treatment (drug vs placebo), time, sex, genotype.
Analogy: "does the drug work, on average, across people?"

🎲

Random Effects

群組是母體的樣本,你不關心特定 level 的效應,只關心群組之間有多少變異。
例:mouse_idpatient_idplate_idcliniccage
類比:你不想知道「小鼠 7 號比 8 號高多少」,只想知道「小鼠之間有多少基線差異」。

Groups are a sample from a population; you do not care about individual level effects, only how much they vary.
e.g. mouse_id, patient_id, plate_id, clinic, cage.
Analogy: not "how much does mouse #7 differ from #8", but "how much do mice vary?"

📐

ICC

Intraclass Correlation Coefficient
ICC = σ²_between / (σ²_between + σ²_within)
同 cluster 內兩觀察值的相關係數。ICC = 0 表完全獨立,ICC → 1 表 cluster 內幾乎相同。神經元 / 細胞層級研究常見 ICC = 0.1–0.5。

Intraclass Correlation Coefficient:
ICC = σ²_between / (σ²_between + σ²_within)
The correlation between two observations from the same cluster. ICC = 0 → fully independent; ICC → 1 → within-cluster nearly identical. Neuron / cell studies often show ICC ≈ 0.1–0.5.

隨機截距 Random Intercept

公式:y ~ x + (1 | mouse)
每隻小鼠有自己的基線(intercept),但反應斜率相同
適用:個體基線差異大(基礎血壓、基礎表達量),但對處理的反應方向類似。

Formula: y ~ x + (1 | mouse)
Each mouse has its own baseline, but a shared slope.
Use when: baselines differ a lot (baseline BP, baseline expression), but the response to treatment is similar across subjects.

隨機斜率 Random Slope

公式:y ~ time + (1 + time | mouse)
每隻小鼠不僅基線不同,對 time 的反應強度也不同
適用:個體間反應差異大(drug responder vs non-responder、成長曲線斜率不一)。
限制:每群至少要有多筆觀察(建議 ≥ 5),否則無法估計群組層級的斜率。

Formula: y ~ time + (1 + time | mouse)
Each mouse has its own baseline and its own response strength to time.
Use when: subjects respond very differently (drug responders vs non-responders, varying growth-curve slopes).
Constraint: each group needs multiple observations (≥ 5 is a healthy minimum) — otherwise group-level slopes cannot be identified.

⌜ y_ij = β₀ + β₁·x_ij + b_0i + b_1i·x_ij + ε_ij  ,  b_i ~ N(0, G)  ,  ε ~ N(0, σ²) ⌝ b_0i = 群 i 的截距偏移;b_1i = 群 i 的斜率偏移。當只有 b_0i 時 = 隨機截距;同時有 b_0i 與 b_1i 時 = 隨機斜率。Laird-Ware (1982 Biometrics 38:963) 是現代混合模型的奠基公式。 ⌜ y_ij = β₀ + β₁·x_ij + b_0i + b_1i·x_ij + ε_ij  ,  b_i ~ N(0, G)  ,  ε ~ N(0, σ²) ⌝ b_0i = group-i intercept offset; b_1i = group-i slope offset. With b_0i alone: random intercept. With both b_0i and b_1i: random slope. Laird & Ware (1982 Biometrics 38:963) is the foundational formulation of modern mixed models.

巢狀資料模擬器:OLS vs 混合模型

下面模擬「小鼠 × 細胞」實驗:產生 n_mice 隻小鼠,每隻測 m 顆細胞,真實處理效應為 β。比較兩種分析:(A) OLS(pseudoreplication)把每顆細胞當獨立觀察,會嚴重低估 SE;(B) 混合模型把 mouse_id 加為隨機截距,SE 反映真實的小鼠數。看看當 between-mouse SD 變大、ICC 變高時,兩種分析的差異會放多大。

Below: a "mouse × cell" experiment. Generate n_mice mice, m cells per mouse, with a true treatment effect β. Compare: (A) OLS (pseudoreplication) — treats every cell as independent, severely underestimates SE; (B) mixed model — adds mouse_id as a random intercept, SE reflects the real number of mice. Increase the between-mouse SD or the ICC and watch the gap grow.

每隻小鼠一種顏色,左 = 對照,右 = 處理。橫線 = 該小鼠的細胞 mean。One colour per mouse; left = control, right = treated. Bar = that mouse's cell-level mean.

⚠️
觀察:把 between-mouse SD 從 0.2 拉到 2.5,OLS 的 p 值幾乎不變,但混合模型的 SE 會明顯放大。這就是為什麼用 OLS 分析巢狀資料會給你「假陽性」。極端例子:n_mice = 3、m = 100 時,OLS 「n = 300」聲稱有顯著差異,混合模型可能告訴你 p > 0.3。 What to watch: push between-mouse SD from 0.2 to 2.5: the OLS p barely budges, but the mixed-model SE inflates a lot. That is why OLS on nested data gives you false positives. Extreme case: n_mice = 3, m = 100 — OLS claims "n = 300" and a huge significance; the mixed model can give p > 0.3.

二、lme4、REML、df 修正、BLUP、GLMM、GEE、Bayesian

📦 lme4

R 的事實標準。lmer(y ~ treat + (1|mouse))。基於 sparse Cholesky 與 PIRLS 的 ECM 演算法,可以處理交叉(crossed)與巢狀(nested)隨機效應,速度比 nlme 快數十倍。但 lme4 預設不給 p 值——Doug Bates 的立場是:適當的 reference distribution 在混合模型中沒有單一正確答案。

The de facto R standard. lmer(y ~ treat + (1|mouse)). Sparse Cholesky + PIRLS ECM; handles crossed and nested random effects far faster than nlme. By design lme4 does not return p-values — Doug Bates argues there is no single correct reference distribution for mixed models.

📦 nlme

R 的「老牌」混合模型套件,lme()nlme()。比 lme4 慢,但支援相關結構(correlation structures)——AR(1)、exchangeable、compound symmetry,這對縱貫資料 (longitudinal) 非常重要。教科書 Pinheiro & Bates 2000《Mixed-Effects Models in S and S-PLUS》仍是經典。

The classic R package: lme() and nlme(). Slower than lme4 but supports correlation structures — AR(1), exchangeable, compound symmetry — which matter for longitudinal data. The textbook Pinheiro & Bates 2000 Mixed-Effects Models in S and S-PLUS remains the canonical reference.

🧪 REML vs ML

REML (Restricted Maximum Likelihood):預設,估變異成分時無偏(unbiased),是推論變異 / ICC 的正確方法。
ML (Maximum Likelihood):估變異會略微低估,但當你要比較固定效應結構不同的兩個模型(用 likelihood ratio test, LRT),必須改成 ML。
實務:lmer(..., REML=TRUE) 報告變異;REML=FALSE 跑 LRT 比模型。

REML: default; produces unbiased variance estimates and is the right choice for inferring variance components / ICC.
ML: slightly biased variance, but required when comparing two models that differ in fixed effects via a likelihood ratio test.
Rule: lmer(..., REML=TRUE) to report; REML=FALSE for LRTs on fixed effects.

🔢 Satterthwaite vs Kenward-Roger

lmer 不直接給 p 值,因為自由度 (df) 在混合模型下沒有清晰定義。近似方案:(1) lmerTest 套件用 Satterthwaite 近似(速度快、適合中大樣本);(2) pbkrtest 套件用 Kenward-Roger(小樣本最準確,是 Luke 2017 J Memory Lang 強力推薦的方法)。小 n 一律用 KR

lmer doesn't print p-values because df is not cleanly defined for mixed models. Approximations: (1) lmerTest uses Satterthwaite (fast, fine for medium/large samples); (2) pbkrtest uses Kenward-Roger (most accurate at small n, recommended by Luke 2017 J Memory Lang). Always prefer KR when n is small.

🎁 BLUPs

Best Linear Unbiased Predictor:每個隨機效應 level 的「最佳預測值」。ranef(fit) 取出每隻小鼠的截距偏移。BLUPs 比 group-wise mean 更穩定,因為它向總平均「收縮」(shrinkage)——這是 James-Stein 估計子的近親,也是 Bayesian 視角下的後驗 mode。

Best Linear Unbiased Predictor: the best predicted value for each random-effect level. ranef(fit) returns per-mouse intercept offsets. BLUPs are more stable than per-group means because they shrink toward the grand mean — a cousin of the James-Stein estimator, and the posterior mode in a Bayesian view.

🔁 GLMM

當反應變數不是常態(binary outcome、count)時,用 glmer(y ~ x + (1|cluster), family=binomial)。例:cluster-randomised trial 的二分結果、神經元 spike count(Poisson)、過度離散計數(negative binomial)。算法用 Laplace 近似或 adaptive Gauss-Hermite quadrature。

When the response is not Gaussian (binary, count), use glmer(y ~ x + (1|cluster), family=binomial). Examples: binary outcomes in cluster-RCTs, neuron spike counts (Poisson), overdispersed counts (negative binomial). Fitted with Laplace approximation or adaptive Gauss-Hermite quadrature.

🌐 GEE

Generalized Estimating Equations(Liang & Zeger 1986 Biometrika 73:13)。提供母體層級(population-average / marginal)推論——你想知道「整個母體的平均效應」,而不是「給定 cluster 的條件效應」。使用 sandwich estimator,對 working correlation 的設定有 robust 性質。GLMM 給 subject-specific 估計、GEE 給 population-average,兩者在非線性連結(logit)下會不同。

Generalized Estimating Equations (Liang & Zeger 1986 Biometrika 73:13). Provides population-average (marginal) inference — "the average effect over the whole population", not the conditional effect given a cluster. Uses a sandwich variance estimator, robust to misspecified working correlation. GLMM gives subject-specific, GEE gives population-average estimates; under nonlinear links (logit) the two differ.

🛠 glmmTMB

建立在 Template Model Builder 之上,速度快、支援 lme4 無法處理的家族:zero-inflated、negative binomial(NB1/NB2)、beta、Tweedie、ordered beta、Conway-Maxwell-Poisson。當 lme4 收斂困難或需要 ZINB 時,第一個替代品就是 glmmTMB。

Built on TMB; fast and supports families lme4 cannot: zero-inflated, negative binomial (NB1/NB2), beta, Tweedie, ordered beta, Conway-Maxwell-Poisson. When lme4 will not converge or you need zero-inflated NB, glmmTMB is the first place to look.

🧠 brms / MCMCglmm

brms(Bürkner 2017 J Stat Softw 80:1)是 R 對 Stan 的封裝,公式語法和 lme4 幾乎一致:brm(y ~ x + (1|mouse)),背後是 NUTS-HMC。當隨機效應 level 很少(< 5),lme4 容易給 singular fit / 邊界估計,這時 weakly informative prior 的 Bayesian 模型(brms / MCMCglmm Hadfield 2010)給的後驗區間更可靠。也支援 missing data 與複雜似然。

brms (Bürkner 2017 J Stat Softw 80:1) wraps Stan in lme4-style syntax: brm(y ~ x + (1|mouse)), fitted with NUTS-HMC. When the number of random-effect levels is small (< 5), lme4 often returns singular fits / boundary estimates; a Bayesian fit (brms / MCMCglmm, Hadfield 2010) with weakly informative priors gives more reliable posterior intervals. Also handles missing data and complex likelihoods.

「Three-of-them」實務口訣:(1) lmer + lmerTest + Kenward-Roger 是 80% 場合的最佳組合;(2) 二分/計數結果改 glmerglmmTMB;(3) 隨機效應 level < 5 或 singular fit → 直接上 brms。記住 Gelman-Hill 2006 的名言:「If you have fewer than 5 levels, partial pooling will help you most.」 The "three-of-them" rule: (1) lmer + lmerTest with Kenward-Roger covers 80% of cases; (2) binary / count outcomes → glmer or glmmTMB; (3) fewer than 5 random-effect levels, or a singular fit → go straight to brms. Remember Gelman & Hill 2006: "If you have fewer than 5 levels, partial pooling helps you the most."

ICC 與設計效應:cluster 越大、ICC 越高,樣本需求飆升

在 cluster-randomised trial(醫院 / 班級 / 動物欄)中,實際所需樣本 = 標稱 n × Design Effect,其中 DE = 1 + (m − 1) · ICC,m = 每 cluster 觀察數。當 ICC = 0.05、m = 20,DE ≈ 1.95——你需要兩倍樣本才能達到原本獨立樣本的 power。Donner & Klar 2000 是 cluster-RCT 設計的標準參考。

In a cluster-randomised trial (hospital, classroom, animal-pen unit), required sample = nominal n × Design Effect, where DE = 1 + (m − 1) · ICC and m = observations per cluster. With ICC = 0.05 and m = 20, DE ≈ 1.95 — you need twice the sample for the same power as an independent design. Donner & Klar 2000 is the standard reference for cluster-RCT design.

🚨
給 clinical trial 設計者:計算 cluster-RCT 樣本時,必須先估計 ICC。常見值:學校層級教育介入 ICC ≈ 0.01–0.10;醫院層級照護品質研究 ICC ≈ 0.01–0.05(Adams et al. 2004);動物欄 cage effect ICC 可達 0.1–0.3。若忽略 cluster 結構,研究會嚴重 underpowered。 For trialists: when sizing a cluster-RCT, estimate ICC first. Typical values: school-level education interventions ICC ≈ 0.01–0.10; hospital-level quality studies ICC ≈ 0.01–0.05 (Adams et al. 2004); animal-cage effects ICC ≈ 0.1–0.3. Ignoring the cluster structure leaves your study badly underpowered.

三、Fixed / Random / GEE / Bayesian——怎麼選?

🌳 混合模型方法選擇樹

Q1:
觀察值彼此獨立嗎?→ 是 → 用普通 OLS / t-test / GLM 即可,不需要混合模型。
Q2:
分群變數的 level 是研究中關心的全部(如 2 種藥)嗎?→ 是 → fixed effect。若 level 是更大母體的樣本(病人 ID、實驗批次) random effect
Q3:
反應變數是常態?→ 是 → lmer(lme4)+ lmerTest / pbkrtest 取得 p 值。→ 否 → glmer / glmmTMB
Q4:
你關心「母體平均效應」而非「條件效應」(如公衛政策評估)?→ 是 → GEE(geepack / statsmodels GEE)+ exchangeable 或 AR(1) working correlation。
Q5:
隨機效應 level < 5、或 lmer 報 singular fit、或樣本極小?→ 是 → Bayesian(brms / MCMCglmm),用 weakly informative prior 穩定變異成分。
Q6:
縱貫資料,且時間點間相關隨距離衰減?→ 是 →nlme::lmecorrelation = corAR1(),或 glmmTMB 的 ar1() 結構。
Q1:
Are observations independent? → Yes → plain OLS / t-test / GLM — no mixed model needed.
Q2:
Are the levels of a grouping variable all of interest (e.g. 2 drugs)? → Yes → fixed effect. If levels are a sample from a larger population (patient ID, batch) random effect.
Q3:
Gaussian response? → Yes → lmer (lme4) + lmerTest / pbkrtest for p-values. → No → glmer / glmmTMB.
Q4:
You want the population-average effect rather than conditional (e.g. public-health policy)? → Yes → GEE (geepack / statsmodels GEE) with exchangeable or AR(1) working correlation.
Q5:
Fewer than 5 random-effect levels, singular fit, or very small n? → Yes → Bayesian (brms / MCMCglmm) with weakly informative priors.
Q6:
Longitudinal data with time-distance-decaying correlation? → Yes → use nlme::lme with correlation = corAR1(), or glmmTMB's ar1() structure.

四、五大工具對照

套件 語言 強項 弱項 典型場景
lme4
Bates 2015
R速度、交叉隨機效應、生態系大不直接給 p 值;非常態家族有限linear / logistic / Poisson 混合模型主力Fast, crossed random effects, huge ecosystemNo p-values out of the box; limited non-GaussianWorkhorse for linear / logistic / Poisson mixed
nlme
Pinheiro-Bates 2000
R支援 correlation structures(AR1、CS、UN)速度慢;不支援交叉隨機效應縱貫資料、藥動力學 (PK/PD)Supports correlation structures (AR1, CS, UN)Slow; no crossed random effectsLongitudinal, PK/PD modelling
statsmodels MixedLMPythonPython 原生、整合 pandas功能比 lme4 少;速度較慢;GLMM 較弱Python pipeline 簡單混合模型Native Python, pandas integrationFewer features than lme4; slower; weak GLMMSimple mixed models in a Python pipeline
glmmTMB
Brooks 2017
RZero-inflated、NB、beta、Tweedie;速度快學習曲線略陡;診斷工具較少過度離散計數、零膨脹資料Zero-inflated, NB, beta, Tweedie; fastSteeper learning curve; fewer diagnosticsOverdispersed counts, zero-inflated data
brms
Bürkner 2017
R (Stan)任何 likelihood、完整不確定性、小 n 穩定慢(MCMC);需要 prior 思考小樣本、複雜模型、報告 credible intervalAny likelihood, full uncertainty, robust at small nSlow (MCMC); requires prior thinkingSmall samples, complex models, credible intervals

五、實作範例

# R: lme4 + lmerTest + pbkrtest + geepack + brms
library(lme4); library(lmerTest)      # Satterthwaite df
library(pbkrtest)                  # Kenward-Roger df (small n)
library(geepack); library(brms)

# --- Linear mixed model: random intercept ---
fit1 <- lmer(expr ~ treat + time + (1 | mouse), data = df,
             REML = TRUE)                    # REML for variance reporting
summary(fit1)                            # lmerTest gives p via Satterthwaite
anova(fit1, ddf = "Kenward-Roger")        # KR df — preferred when n small

# --- Random slope + intercept ---
fit2 <- lmer(expr ~ time + (1 + time | mouse), data = df)
VarCorr(fit2)                            # variance components
ranef(fit2)$mouse                       # BLUPs per mouse
fixef(fit2)                             # fixed-effect estimates

# --- Compare fixed-effect structures: USE ML, not REML ---
m_null <- lmer(expr ~ 1     + (1|mouse), data = df, REML = FALSE)
m_alt  <- lmer(expr ~ treat + (1|mouse), data = df, REML = FALSE)
anova(m_null, m_alt)                     # likelihood ratio test

# --- GLMM: binary outcome with random clinic ---
g <- glmer(recovered ~ drug + age + (1|clinic),
            data = trial, family = binomial)

# --- GEE: population-average estimate ---
gee <- geeglm(recovered ~ drug + age, id = clinic, data = trial,
              family = binomial, corstr = "exchangeable")

# --- ICC from a fitted lmer ---
v <- as.data.frame(VarCorr(fit1))
icc <- v$vcov[1] / sum(v$vcov)

# --- Bayesian fallback for small n / singular fits ---
fit_b <- brm(expr ~ treat + (1|mouse), data = df,
             prior = prior(student_t(3, 0, 2.5), "sd"),
             chains = 4, cores = 4)
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.genmod.generalized_estimating_equations import GEE
from statsmodels.genmod.families import Binomial

# --- Linear mixed model: random intercept ---
m = smf.mixedlm('expr ~ treat + time', df, groups=df['mouse'])
fit = m.fit(reml=True)
print(fit.summary())

# --- Random slope: vc_formula or re_formula ---
m2 = smf.mixedlm('expr ~ time', df, groups=df['mouse'],
                   re_formula='~time')
fit2 = m2.fit()
print(fit2.random_effects)         # BLUPs per mouse
print(fit2.fe_params)              # fixed effects

# --- ICC ---
sig2_b = fit.cov_re.iloc[0, 0]
sig2_w = fit.scale
icc = sig2_b / (sig2_b + sig2_w)

# --- GLMM: BinomialBayesMixedGLM (limited) ---
from statsmodels.genmod.bayes_mixed_glm import BinomialBayesMixedGLM
glmm = BinomialBayesMixedGLM.from_formula(
    'recovered ~ drug + age', {'clinic': '0 + C(clinic)'}, trial)
res = glmm.fit_vb()

# --- GEE: population-average ---
gee = GEE.from_formula('recovered ~ drug + age', 'clinic', trial,
                       family=Binomial(),
                       cov_struct=sm.cov_struct.Exchangeable())
gee_fit = gee.fit()
print(gee_fit.summary())
💡
實作小撇步:(1) Always plot residuals(fit) vs fitted——看是否還有殘留結構;(2) ranef 取出 BLUPs 後畫 caterpillar plot(dotplot(ranef(fit))),檢查極端 outlier group;(3) performance::icc() 一行算 conditional 與 marginal ICC(Nakagawa-Schielzeth 2013 Methods Ecol Evol 4:133)。 Practical tips: (1) Always plot residuals(fit) vs fitted to check for leftover structure; (2) pull BLUPs with ranef and draw a caterpillar plot (dotplot(ranef(fit))) to spot outlier groups; (3) performance::icc() gives conditional and marginal ICC in one line (Nakagawa-Schielzeth 2013 Methods Ecol Evol 4:133).

六、六大陷阱

Pseudoreplication

Hurlbert 1984「假裝相關資料是獨立」是這個領域最古老、最普遍的錯誤。電生理(多細胞 / 單鼠)、組織學(多切片 / 單樣本)、影像(多視野 / 單皿)皆是高風險區。解方:cluster 變數一律放成 random effect。Lazic 2010 用一篇 BMC Neurosci 整篇文章證明:神經科學論文中此錯誤普及率高得令人擔憂。

Hurlbert 1984's "treating correlated data as independent" remains the oldest and most common offence. High-risk zones: electrophysiology (many cells / one mouse), histology (many slices / one block), imaging (many fields / one dish). Fix: any clustering variable becomes a random effect. Lazic 2010 BMC Neurosci showed the problem is widespread in neuroscience papers.

Singular fit

lme4 警告 boundary (singular) fit: see ?isSingular,通常表示某個變異成分被估到 0、或隨機斜率與截距完美相關。常見原因:random-effect level 太少(< 5)、模型過度複雜(random slope 但每群觀察太少)、資料的真實 ICC 確實接近 0。解方:簡化隨機結構,或改用 brms(Bayesian 自然處理邊界)。

lme4 warns boundary (singular) fit: see ?isSingular — usually a variance component estimated at zero or an intercept-slope correlation pinned at ±1. Causes: too few random-effect levels (< 5), an over-rich random structure (random slope with few obs per group), or a genuinely tiny ICC. Fix: simplify the random structure, or switch to brms (Bayesian handles boundaries naturally).

REML 比較固定效應

REML 的 likelihood 與固定效應設定有關,用 REML 跑 LRT 比較不同固定效應結構是錯的。改用 REML = FALSE(ML)來比較固定效應;要報告變異成分時再切回 REML。anova(m_null, m_alt) 在 lme4 中會自動 refit 到 ML,但別忽視這個細節。

The REML likelihood depends on the fixed-effects specification, so using REML for a LRT across different fixed-effect models is invalid. Switch to REML = FALSE (ML) for fixed-effects comparisons; switch back to REML when reporting variance components. anova(m_null, m_alt) in lme4 quietly refits to ML — don't ignore the detail.

「lmer 不給 p」

原生 lme4 的 summary 不印 p 值——但很多論文卻寫了「p < 0.001」。必須附上方法:Satterthwaite (lmerTest) 或 Kenward-Roger (pbkrtest),或用 LRT、parametric bootstrap、profile CI。Luke 2017 J Memory Lang 強烈推薦 KR 為小樣本黃金標準。

The native summary(lmer) prints no p-values — yet papers cheerfully report "p < 0.001". State the method: Satterthwaite (lmerTest), Kenward-Roger (pbkrtest), LRT, parametric bootstrap, or profile CI. Luke 2017 J Memory Lang strongly recommends KR as the small-sample gold standard.

收斂警告被忽視

Model failed to convergemaxfun=2*10^4 警告經常被當「擺著就好」。但這通常代表優化器沒走到 ML/REML 真正的極值,估計值不可靠。處置:(1) 換 optimizer:control = lmerControl(optimizer = "bobyqa");(2) 中心化/標準化 predictor;(3) 簡化隨機結構;(4) 改 brms / glmmTMB。

Model failed to converge and maxfun=2*10^4 warnings are routinely "left in place". They usually mean the optimiser did not reach the actual ML/REML optimum, so estimates are unreliable. What to do: (1) try another optimiser: control = lmerControl(optimizer = "bobyqa"); (2) centre / scale predictors; (3) simplify the random structure; (4) switch to brms / glmmTMB.

忽略交叉隨機效應

當「受試者」與「項目(item / stimulus)」彼此都重複(如 psycholinguistic 實驗),必須同時放 random effecty ~ x + (1|subject) + (1|item)。Baayen-Davidson-Bates 2008 J Memory Lang 是這個觀念的標誌。多個生物實驗(trial × cell line、reader × image)也適用。lme4 對交叉結構支援良好,nlme 不行。

When "subjects" and "items" both repeat (e.g. psycholinguistics, reader × image, trial × cell line) you must include both as random effects: y ~ x + (1|subject) + (1|item). Baayen, Davidson & Bates 2008 J Memory Lang is the landmark paper. lme4 supports crossed random effects natively; nlme does not.

陷阱:random effect 不是萬靈丹 「我加了 random effect,所以小 n 沒問題」——錯。隨機效應可以正確處理結構,但無法憑空產生資訊。當你只有 3 隻小鼠,混合模型誠實地告訴你「自由度只有 ~2」,這正是它的價值。若連 2 自由度都不夠做你想做的 inference,真正的答案是再做幾隻小鼠,不是換另一個模型。 "I added a random effect, so n=3 is fine" — no. Random effects handle structure correctly, but they cannot manufacture information. With three mice, the mixed model honestly tells you "df ≈ 2" — that is its value. If 2 df is not enough for your inference, the real answer is more mice, not a different model.

七、四個典型場景

🧬 縱貫病人測量

50 位糖尿病患者,每位每月測 HbA1c 共 12 個月。模型HbA1c ~ time + treat + time:treat + (1 + time | patient)。time:treat 是「處理改變斜率」的 interaction;random slope 容許每位患者自己的血糖控制軌跡。

50 diabetes patients, HbA1c measured monthly for 12 months. Model: HbA1c ~ time + treat + time:treat + (1 + time | patient). The time:treat interaction captures the treatment effect on the slope; the random slope lets each patient have their own glycaemic trajectory.

🏥 Cluster RCT

20 家診所被隨機分配「新照護模式 vs 標準」,每家 100 位病人。模型outcome ~ arm + age + (1 | clinic);二分結果用 glmer binomial 或 GEE exchangeable。樣本量計算必須乘以 design effect(見模擬 ②)。

20 clinics randomised to "new care model vs standard", 100 patients per clinic. Model: outcome ~ arm + age + (1 | clinic); binary outcome with glmer binomial or GEE exchangeable. Sample-size calculation must multiply by the design effect (see Sim ②).

🧠 多電極神經記錄

5 隻大鼠,每隻 32 通道,每通道記錄 ~200 神經元 spike rate。三層巢狀rate ~ stim + (1 | rat / channel / unit)。完美對應 Aarts 2014 Nat Neurosci 的警告:別把 32 × 200 當 6400 個獨立樣本。

5 rats, 32 channels each, ~200 neuron spike rates per channel. Three-level nesting: rate ~ stim + (1 | rat / channel / unit). Exactly the scenario Aarts et al. 2014 Nat Neurosci warn about — don't treat 32 × 200 as 6,400 independent samples.

🌐 多地點基因體研究

5 個中心,每個中心 200 個樣本,做 RNA-seq 表達量分析。模型expression ~ disease + age + sex + (1 | center),或把 center 與 batch 雙隨機效應 + (1 | batch)。這也是消除批次效應 (batch effect) 的補充工具,跟 ComBat / sva 互補。

5 centres, 200 RNA-seq samples each. Model: expression ~ disease + age + sex + (1 | center), or include both centre and batch + (1 | batch). Complements ComBat / sva for batch-effect handling.

📝 自我檢測

1. 你用 5 隻小鼠,每隻取 20 顆神經元做電生理。最合適的分析是?

1. You record 20 neurons each from 5 mice in electrophysiology. The right analysis is?

A. 把 100 顆神經元當獨立樣本做 t 檢定(n = 100)A. Pool 100 neurons as independent and run a t-test (n = 100)
B. 把神經元按 mouse 平均後做 t 檢定(n = 5),其他全部丟掉B. Average to per-mouse means and run a t-test (n = 5), discarding the rest
C. 用混合模型 y ~ treat + (1 | mouse),把 mouse 設為隨機效應C. Fit a mixed model y ~ treat + (1 | mouse) with mouse as a random effect
D. 改用非參數檢定就好D. Just switch to a non-parametric test

2. 同事用 lmery ~ x1 + (1|group)y ~ x1 + x2 + (1|group),皆用預設 REML,然後 anova() 比較。正確嗎?

2. A colleague fits y ~ x1 + (1|group) and y ~ x1 + x2 + (1|group) in lmer with default REML, then compares them with anova(). Correct?

A. 完全正確,REML 是預設A. Completely correct, REML is the default
B. 不對。比較固定效應應該用 ML(REML = FALSE);lme4 的 anova 會自動 refit 但要明白為什麼B. Wrong. Compare fixed effects under ML (REML = FALSE); lme4's anova refits automatically but you should know why
C. 應該用 Bayesian factorC. Should use a Bayes factor
D. 應該改成 GEED. Should switch to GEE

3. ICC = 0.1,每 cluster 21 人。Design effect 與「為了相同 power 所需樣本」的關係是?

3. ICC = 0.1, 21 per cluster. What's the design effect and its impact on required sample size?

A. DE = 1.0,跟獨立樣本一樣A. DE = 1.0, same as independent
B. DE = 1.21,輕微增加B. DE = 1.21, slight increase
C. DE = 2.10,但對 power 沒影響C. DE = 2.10, but no impact on power
D. DE = 1 + 20×0.1 = 3.0,需要 3 倍樣本D. DE = 1 + 20×0.1 = 3.0; need 3× the sample

4. 在分析 cluster-RCT 的二分結果時,你想要「整體母體的處理效應」而非「條件於 clinic 的效應」。最合適的方法?

4. Analysing a cluster-RCT binary outcome and you want the "population-average" effect rather than the conditional-on-clinic effect. Best method?

A. 普通 logistic regression,忽略 clinicA. Plain logistic regression, ignoring clinic
B. glmer(GLMM)binomial + 隨機截距B. glmer (GLMM) binomial with random intercept
C. GEE,exchangeable working correlation + sandwich SEC. GEE with exchangeable working correlation + sandwich SE
D. 用 chi-square 檢定即可D. A chi-square test will do