STEP 7 / 13

變異數分析 (ANOVA)

F = 組間變異 / 組內變異——Fisher 1925 開創、至今仍是三組以上平均比較的主力工具,也是迴歸的近親。

F = between-group variance / within-group variance — Fisher's 1925 invention is still the workhorse for comparing 3+ means, and the close cousin of regression.

為什麼需要 ANOVA?

當你有 三組或以上 的連續資料要比較平均(例如:安慰劑 / 低劑量 / 中劑量 / 高劑量四組血壓),最直覺的想法是「兩兩 t 檢定」。錯。K 組做 C(K,2) 次 t 檢定,整體 Type I error 會爆炸:3 組 3 次 → 1 − 0.95³ ≈ 14%;4 組 6 次 → ≈ 26%;6 組 15 次 → ≈ 54%。這就是著名的「multiple comparisons problem」。

Ronald A. Fisher 在 1925 年《Statistical Methods for Research Workers》提出 ANOVA:一次性檢定「至少一對平均不同」,並把總變異拆解成「組間 (between)」與「組內 (within)」兩個來源。比值 F = MSbetween / MSwithin 服從 F 分布(Fisher–Snedecor),當 F 顯著大時拒絕「所有平均相等」的虛無假設。

更深的洞察:ANOVA 其實就是用 dummy variable 做的線性迴歸aov(y ~ group)lm(y ~ factor(group)) 在數學上完全等價——這是現代統計把 ANOVA 視為「廣義線性模型 (GLM) 的特例」的根本原因(McCullagh & Nelder 1989)。

With three or more groups of continuous data (placebo / low / medium / high dose blood pressure), the instinct is "do all pairwise t-tests". Wrong. K groups need C(K,2) t-tests and Type I error explodes: 3 groups, 3 tests → 1 − 0.95³ ≈ 14%; 4 groups, 6 tests → ≈ 26%; 6 groups, 15 tests → ≈ 54%. That is the classic multiple-comparisons problem.

Ronald A. Fisher's 1925 Statistical Methods for Research Workers introduced ANOVA: a single test for "at least one pair of means differs" that partitions total variability into a between-group piece and a within-group piece. The ratio F = MSbetween / MSwithin follows the Fisher–Snedecor F-distribution; a large F rejects the null that all means are equal.

Deeper insight: ANOVA is just linear regression with dummy variables. aov(y ~ group) is mathematically identical to lm(y ~ factor(group)) — which is why modern statistics treats ANOVA as a special case of the GLM (McCullagh & Nelder 1989).

💡
歷史小註:Fisher 在 Rothamsted Experimental Station 做農業實驗(不同肥料的小麥產量),需要同時比較多組——這正是 ANOVA 誕生的場景。F 分布的「F」就是為了紀念 Fisher,由 George W. Snedecor 1934 年命名。 Historical note: Fisher invented ANOVA at Rothamsted Experimental Station while analysing wheat yields across fertilizer treatments — the prototypical multi-group comparison. The "F" in F-distribution honours Fisher; the name was coined by George W. Snedecor in 1934.

一、F 統計量的本質

令第 i 組第 j 個觀察值為 yij,組平均 ȳ,總平均 ȳ··。ANOVA 的核心恆等式是「SS_total = SS_between + SS_within」:

Let yij be observation j in group i, with group mean ȳ and grand mean ȳ··. The core ANOVA identity is SS_total = SS_between + SS_within:

⌜ SSbetween = Σi ni − ȳ··)²   ·   SSwithin = ΣiΣj (yij − ȳ)²   ·   F = MSbetween / MSwithin = (SSbetween/(K−1)) / (SSwithin/(N−K)) dfbetween = K−1(K 組),dfwithin = N−K(N 總觀察數)。當所有 μᵢ 相等時,E[MS_between] = E[MS_within] = σ²,所以 F ≈ 1;組間真有差距時,E[MS_between] > σ²,F 偏大。 ⌜ SSbetween = Σi ni − ȳ··)²   ·   SSwithin = ΣiΣj (yij − ȳ)²   ·   F = MSbetween / MSwithin = (SSbetween/(K−1)) / (SSwithin/(N−K)) dfbetween = K−1 (K groups), dfwithin = N−K (N total observations). When all μᵢ are equal, E[MS_between] = E[MS_within] = σ², so F ≈ 1; if a real difference exists, E[MS_between] > σ² and F grows.
🔢

One-way

一個分類自變項(如 4 種處理),檢定「所有組平均皆相等」。最常見的形式。等價於 lm(y ~ group),omnibus F-test 同 lm 的整體 F-test。

One categorical predictor (e.g., four treatments); tests "all group means equal". The most common form. Equivalent to lm(y ~ group); the omnibus F equals the regression F.

🔀

Two-way

兩個分類自變項 + 交互作用 (interaction)。例:藥物 × 劑量。看主效果 (main effect) 與「藥物效果是否隨劑量改變」。違反 additivity 時 interaction 顯著。

Two categorical predictors + an interaction. Example: drug × dose. Read main effects and "does the drug effect vary with dose?". A significant interaction means non-additivity.

🔁

RM-ANOVA

同一受試者反覆測量(baseline / week 2 / week 4)。需配對 within-subject 相關,假設 sphericity(球度)——Mauchly 檢定違反就用 Greenhouse–Geisser 修正。現在多被混合模型 (mixed model) 取代(見 Step 13)。

The same subject measured repeatedly (baseline / week 2 / week 4). Accounts for within-subject correlation; assumes sphericity — when Mauchly's test fails, apply Greenhouse–Geisser. Now largely superseded by mixed-effects models (see Step 13).

⚖️

Welch / B-F

Brown & Forsythe 1974 與 Welch (1951) 提出:當各組變異數不等時,傳統 ANOVA 的 Type I error 失控。Welch ANOVA 不假設等變異,應作為預設選項,呼應 Welch t-test。

Brown & Forsythe 1974 and Welch (1951): when group variances are unequal, classical ANOVA's Type I error misbehaves. Welch ANOVA does not assume equal variances and should be the default, mirroring Welch t-test.

🪜

Kruskal–Wallis

Kruskal & Wallis 1952 JASA:rank-based 無母數版本,不假設常態。當 n 小且明顯偏態時的替代。後續配對用 Dunn's test 或 pairwise Wilcoxon + BH。

Kruskal & Wallis 1952 JASA: rank-based non-parametric alternative, no normality assumption. Use when n is small with clear skew. Follow up with Dunn's test or pairwise Wilcoxon + BH.

📐

ANOVA = 迴歸

K 組對應 K−1 個 dummy variable(reference coding)。lm(y ~ group) 的 F-test 與 aov 的 F 相同;β 係數就是「該組 vs reference 的平均差」。理解這點就能無縫進入 ANCOVA(加共變量)。

K groups → K−1 dummy variables (reference coding). The F-test from lm(y ~ group) equals aov's F; each β is "group vs reference mean difference". Grasping this unlocks ANCOVA (add covariates).

三組 F 統計量遊樂場

調整三組平均 μ₁、μ₂、μ₃,與共同的組內 SD 以及每組 n。觀察 F 與 p 值如何變化。核心直覺:組間差距越大 → SS_between 越大 → F 越大;組內 SD 越大(噪音多)→ SS_within 越大 → F 越小;n 越大 → F 對「真實小差距」越敏感。η² 是效果量(見下方)。

Tune the three means μ₁, μ₂, μ₃, the common within-group SD, and per-group n. Watch F and p change. Intuition: bigger group separation → larger SS_between → larger F; bigger within-group SD (more noise) → larger SS_within → smaller F; larger n → F is more sensitive to small real differences. η² is the effect size (see below).

三組模擬資料的點圖(每點為一觀察值,橫條 = 組平均)Three-group dotplot (each point = one observation, bar = group mean)

二、三大假設與如何檢核

1️⃣ 殘差常態性

不是「每組資料常態」,而是模型殘差 eij = yij − ȳ 近似常態。檢查方式:殘差 QQ plot(最直觀)、Shapiro–Wilk(n < 50)。CLT 加持下,n 大時 ANOVA 對輕度違反相當穩健(Glass et al. 1972)。

Not "data are normal in each group" — the model residuals eij = yij − ȳ should be approximately normal. Check via residual QQ plot (most informative) and Shapiro–Wilk (n < 50). With CLT support, ANOVA is fairly robust to mild violations when n is large (Glass et al. 1972).

2️⃣ 同質變異

各組 σ² 相同。Levene's test(中位數版較穩健)、Bartlett's test(對非常態敏感)。但同 t-test 章節的批評:先做檢定再決定是 Student/Welch 屬於資料驅動決策,會放大 Type I error(Zimmerman 2004)。直接用 Welch ANOVA 是更乾淨的選擇

Equal σ² across groups. Levene's test (median version is robust); Bartlett's test (sensitive to non-normality). Same critique as the t-test chapter: testing first and then choosing classical vs Welch is data-driven and inflates Type I error (Zimmerman 2004). Defaulting to Welch ANOVA is the cleaner choice.

3️⃣ 獨立性

觀察值彼此獨立。最難檢核也最致命——同隻老鼠多切片、同培養皿多 well、同病人多次採血都違反此假設(pseudoreplication, Hurlbert 1984)。違反就要用 RM-ANOVA 或 mixed model(Step 13)。

Observations are independent. The hardest to check and the most dangerous — multiple slices from one mouse, multiple wells per dish, repeated draws from one patient all violate this (pseudoreplication, Hurlbert 1984). Switch to RM-ANOVA or a mixed model (Step 13).

⚠️
常見錯誤:分別對每一組跑 Shapiro–Wilk。這在 K 大、n 小時非常容易誤判(每組功效低),而且檢核錯了對象——ANOVA 假設的是「殘差」常態,不是「組內資料」常態。正確做法:先擬合模型,從 resid(fit) 上做 QQ plot。 Common mistake: running Shapiro–Wilk on each group separately. With small n and many groups, this gives low power per test and checks the wrong thing — ANOVA's normality assumption is on the residuals, not raw group data. Right way: fit the model, then QQ-plot resid(fit).

三、Post-hoc:誰跟誰不同?

ANOVA 的 F 顯著只告訴你「至少一對不同」,不告訴你哪一對。要回答這個問題,就需要 post-hoc 檢定,並且每種方法控制不同類型的錯誤率:FWER (family-wise) 或 FDR (false discovery rate)。最常用的五種:

A significant omnibus F tells you "at least one pair differs" but not which pair. That requires a post-hoc test, and each option controls a different error rate — family-wise (FWER) or false discovery rate (FDR). The five workhorses:

方法 範圍 控制 保守度 情境
Tukey HSD所有 K(K−1)/2 配對FWER標準首選;等樣本數時最佳 (Tukey 1949)All K(K−1)/2 pairsFWERMediumDefault; optimal with equal n (Tukey 1949)
Bonferroni任意 m 個檢定FWER高(最保守)少量比較、簡單透明;m 大時功效太低Any m testsFWERHigh (most conservative)Few comparisons; transparent; underpowered for large m
Holm任意 m 個檢定FWER中(uniformly < Bonferroni)Bonferroni 的「step-down」版,功效更高 (Holm 1979)Any m testsFWERMedium (uniformly < Bonferroni)Step-down version, higher power (Holm 1979)
DunnettK−1 個(每組 vs 單一 control)FWER劑量試驗、藥物 vs 安慰劑 (Dunnett 1955)K−1 (each vs single control)FWERMediumDose-response; drug vs placebo (Dunnett 1955)
Scheffé所有可能線性對比 (contrasts)FWER最高(極保守)事後想到的複雜對比(如 (A+B)/2 vs C)(Scheffé 1959)All linear contrastsFWERHighest (very conservative)Post-hoc complex contrasts like (A+B)/2 vs C (Scheffé 1959)
實務建議:(1) 全對比 → Tukey HSD;(2) 與 control 比 → Dunnett(功效比 Tukey 高,因為比較數少);(3) 計畫好的少數對比 → Holm;(4) 探索性、事後想到的複雜對比 → Scheffé。不要先看資料才決定哪些配對要比(cherry-picking)——這是 p-hacking。 Practical rules: (1) all pairs → Tukey HSD; (2) vs control → Dunnett (higher power than Tukey because fewer comparisons); (3) pre-planned few contrasts → Holm; (4) post-hoc complex exploratory contrasts → Scheffé. Don't peek at the data and then decide which pairs to test — that's p-hacking.

Post-hoc 調整怎麼隨 K 變化

滑動 K(組數)。觀察當 K 增加時,每個配對檢定的「有效 α」如何被三種方法調整。Bonferroni:α/m,隨 m=C(K,2) 線性下降。Tukey HSD:使用 studentized range distribution,調整較溫和。Holm step-down:sequential 調整,比 Bonferroni 寬鬆。

Slide K (number of groups). Watch how the per-test effective α shrinks under each method as K grows. Bonferroni: α/m, linear in m=C(K,2). Tukey HSD: uses the studentized range distribution; gentler adjustment. Holm step-down: sequential, less conservative than Bonferroni.

y 軸 = 每個配對檢定的有效 α 門檻(越低越保守)y-axis = effective per-test α threshold (lower = more conservative)

四、決策樹

🌳 ANOVA 決策樹

Q1:
資料明顯非常態且 n 小 (< 15/組)?→ 是 → Kruskal–Wallis + Dunn post-hoc 或 pairwise Wilcoxon + BH。
Q2:
同一受試者多時點測量?→ 是 → 優先 linear mixed model(Step 13);傳統 RM-ANOVA 須先 Mauchly 檢球度,違反就 Greenhouse–Geisser 校正。
Q3:
兩個分類因子要看交互作用?→ 是 → Two-way ANOVA;不平衡設計用 Type II 或 III SS(car::Anova)。
Q4:
Levene's test 顯著或 boxplot 看出變異不等?→ 是 → Welch ANOVAoneway.test);後續用 Games–Howell post-hoc。
Q5:
單一分類因子、變異近似相等、殘差近常態?→ 是 → One-way ANOVA。後續:全對比 → Tukey HSD;vs control → Dunnett。
Q6:
有共變量要控制(年齡、baseline 值)?→ 是 → ANCOVAlm(y ~ group + covariate)),等價於迴歸。
Q1:
Clearly non-normal with small n (< 15/group)? → Yes → Kruskal–Wallis + Dunn or pairwise Wilcoxon + BH.
Q2:
Same subjects across timepoints? → Yes → Prefer a linear mixed model (Step 13); classical RM-ANOVA needs Mauchly + Greenhouse–Geisser if sphericity fails.
Q3:
Two categorical factors with possible interaction? → Yes → Two-way ANOVA; for unbalanced designs use Type II/III SS (car::Anova).
Q4:
Levene's test significant or visibly unequal spread? → Yes → Welch ANOVA (oneway.test); pair with Games–Howell post-hoc.
Q5:
One factor, roughly equal variances, residuals near-normal? → Yes → One-way ANOVA. Post-hoc: all pairs → Tukey HSD; vs control → Dunnett.
Q6:
Need to adjust for covariates (age, baseline)? → Yes → ANCOVA (lm(y ~ group + covariate)) — equivalent to regression.

五、效果量與不平衡設計

效果量

η² (eta squared) = SS_between / SS_total,組間變異佔總變異的比例。Cohen 1988 標準:0.01 小、0.06 中、0.14 大。

ω² (omega squared):較不偏估,小樣本時優於 η²。partial η² 用於 multi-factor ANOVA,分母只用該效果 + 殘差,方便分別解讀各 main effect。報告時建議同時給 p、F、df、η²/ω²、95% CI。

η² (eta squared) = SS_between / SS_total — proportion of total variance explained by groups. Cohen 1988 benchmarks: 0.01 small, 0.06 medium, 0.14 large.

ω² (omega squared): less biased, better for small samples. partial η² for multi-factor designs uses only "this effect + residual" as denominator, so each main effect reads independently. Report p, F, df, η²/ω², 95% CI together.

Type I / II / III SS

當組樣本數 不等 時,主效果的 SS 計算順序會影響結果。Type I(序貫):R 預設 anova(),加入順序敏感;Type II:忽略 interaction 計算主效果(推薦無顯著 interaction 時);Type III:SAS 預設,car::Anova(fit, type=3),必須設正交對比 (contr.sum) 才正確。

Langsrud 2003 Statistics & Computing 詳述差異。最安全:平衡設計 (equal n) 三種結果相同;不平衡時論文 Methods 明確寫使用哪一種。

With unequal group sizes the order in which SS for main effects is computed matters. Type I (sequential): R's default anova(), order-sensitive; Type II: main effects ignoring interaction (recommended when interaction is non-significant); Type III: SAS default, car::Anova(fit, type=3) — requires orthogonal contrasts (contr.sum) to be valid.

Langsrud 2003 Statistics & Computing walks through the differences. Safest: with balanced designs all three agree; in unbalanced designs state explicitly which type you used.

🚨
「我跑 anova() 跟 car::Anova() 結果不一樣!」——這幾乎一定是因為設計不平衡 + SS 類型不同。R 預設 Type I(序貫),SAS 與多數教科書預設 Type III。寫論文時不要只貼 R 預設輸出就交差,要說明:「Type III sums of squares were computed using car::Anova with sum-to-zero contrasts.」 "My anova() and car::Anova() give different answers!" — almost always unbalanced design + different SS types. R defaults to Type I (sequential); SAS and most textbooks default to Type III. Don't ship R's default output without noting: "Type III sums of squares computed via car::Anova with sum-to-zero contrasts."

六、實作範例

# R: classical, Welch, Kruskal–Wallis, post-hoc, two-way
library(tidyverse); library(car); library(emmeans); library(rstatix)

# Drug dose-response: placebo / low / medium / high (n=20 each)
df <- tibble(
  dose  = factor(rep(c("placebo","low","med","high"), each=20),
                levels=c("placebo","low","med","high")),
  bp    = c(rnorm(20,140,8), rnorm(20,135,8),
           rnorm(20,130,8), rnorm(20,122,8)))

# --- 1. Classical one-way ANOVA (assumes equal variance) ---
fit <- aov(bp ~ dose, data = df)
summary(fit)                       # F, df, p
summary(lm(bp ~ dose, data = df))   # identical F — ANOVA = regression

# --- 2. Assumption checks on RESIDUALS ---
plot(fit, which = 2)                # residual QQ plot
shapiro.test(resid(fit))           # residual normality
car::leveneTest(bp ~ dose, data = df, center = median)

# --- 3. Welch ANOVA (default if variances unequal) ---
oneway.test(bp ~ dose, data = df, var.equal = FALSE)

# --- 4. Post-hoc ---
TukeyHSD(fit)                       # all pairs, FWER
emmeans::emmeans(fit, pairwise ~ dose, adjust = "tukey")
# Dunnett: each vs placebo (control)
emmeans::emmeans(fit, trt.vs.ctrl ~ dose, ref = "placebo")
# Games–Howell for unequal variance
rstatix::games_howell_test(df, bp ~ dose)

# --- 5. Kruskal–Wallis non-parametric ---
kruskal.test(bp ~ dose, data = df)
rstatix::dunn_test(df, bp ~ dose, p.adjust.method = "BH")

# --- 6. Two-way ANOVA + interaction ---
fit2 <- lm(bp ~ drug * dose, data = df2)
car::Anova(fit2, type = 3)         # Type III SS (unbalanced safe)

# --- 7. Effect size ---
rstatix::anova_summary(fit, effect.size = "pes")   # partial η²
effectsize::omega_squared(fit)              # ω²
import numpy as np, pandas as pd
from scipy import stats
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import pingouin as pg          # clean ANOVA + effect size

rng = np.random.default_rng(1)
df = pd.DataFrame({
  "dose": np.repeat(["placebo","low","med","high"], 20),
  "bp":   np.concatenate([rng.normal(m,8,20) for m in [140,135,130,122]])
})

# --- 1. Classical one-way ANOVA ---
fit = ols("bp ~ C(dose)", data=df).fit()
sm.stats.anova_lm(fit, typ=2)
stats.f_oneway(*[df[df.dose==g].bp for g in df.dose.unique()])

# pingouin gives F, df, p, η², ω² in one line
pg.anova(data=df, dv="bp", between="dose", effsize="np2")

# --- 2. Assumption checks on residuals ---
stats.shapiro(fit.resid)
stats.levene(*[df[df.dose==g].bp for g in df.dose.unique()], center="median")

# --- 3. Welch ANOVA ---
pg.welch_anova(data=df, dv="bp", between="dose")

# --- 4. Post-hoc ---
pairwise_tukeyhsd(df.bp, df.dose)              # Tukey HSD
pg.pairwise_gameshowell(data=df, dv="bp", between="dose")
pg.pairwise_tests(data=df, dv="bp", between="dose",
                   parametric=True, padjust="holm")

# --- 5. Kruskal–Wallis + Dunn ---
stats.kruskal(*[df[df.dose==g].bp for g in df.dose.unique()])
import scikit_posthocs as sp
sp.posthoc_dunn(df, val_col="bp", group_col="dose", p_adjust="fdr_bh")

# --- 6. Two-way + interaction, Type III SS ---
fit2 = ols("bp ~ C(drug) * C(dose)", data=df2).fit()
sm.stats.anova_lm(fit2, typ=3)
💡
建議的最小報告組合:"One-way ANOVA showed a significant effect of dose on systolic BP (F(3,76)=18.4, p<0.001, ω²=0.39). Tukey HSD: high vs placebo Δ=−18 mmHg, 95% CI [−24, −12], p<0.001; med vs placebo Δ=−10 mmHg, 95% CI [−16, −4], p=0.001." Minimal recommended reporting: "One-way ANOVA showed a significant effect of dose on systolic BP (F(3,76)=18.4, p<0.001, ω²=0.39). Tukey HSD: high vs placebo Δ=−18 mmHg, 95% CI [−24, −12], p<0.001; med vs placebo Δ=−10 mmHg, 95% CI [−16, −4], p=0.001."

七、六大陷阱

Omnibus ≠ 特定對比

F 顯著只說「至少一對不同」,不能直接結論「treatment vs placebo 有效」。必須跑 post-hoc 才能說特定配對;甚至 omnibus 不顯著時,pre-planned contrast 仍可能顯著(Hsu 1996)。

A significant F means "at least one pair differs" — it does not by itself prove "treatment vs placebo works". Run a post-hoc to claim a specific pair; and a pre-planned contrast can still be significant even when the omnibus isn't (Hsu 1996).

Pseudoreplication

3 隻老鼠每隻 4 個切片 ≠ n=12。Hurlbert 1984 的經典論文指出生態與生物實驗最常見的錯誤。獨立單位是「mouse」,切片是 within-subject 重複——須用混合模型把 mouse 設成 random effect。

3 mice × 4 slices ≠ n=12. Hurlbert 1984's classic paper documents this as the most common error in biological and ecological studies. The independent unit is the mouse; slices are within-subject replicates — fit a mixed model with mouse as a random effect.

不做後續調整

「omnibus p=0.04 ✓,再隨便跑幾個 t-test 看哪對顯著」——這是 garden-of-forking-paths 的典型路徑。當你比較 m 對,要報 adjusted p(Tukey/Bonferroni/Holm)並在 Methods 寫清楚。

"Omnibus p=0.04 ✓; now run a few t-tests to find which pair is significant" — textbook garden-of-forking-paths. With m pairs, report adjusted p (Tukey/Bonferroni/Holm) and state the method in Methods.

不平衡未說 SS 類型

Type I/II/III 在不平衡時結果不同。Langsrud 2003:論文要明確說「Type III SS via car::Anova with contr.sum」否則他人無法重現。R 預設 Type I 是常見地雷。

Type I/II/III diverge on unbalanced data. Langsrud 2003: papers must state "Type III SS via car::Anova with contr.sum" — otherwise the analysis cannot be reproduced. R's default (Type I) is a frequent trap.

Outlier 毒殺 F

SS 用平方距離,一個極端值就能讓 MS_within 暴增、F 變不顯著。先看 boxplot、QQ plot;嚴重時改用 Kruskal–Wallis 或 robust ANOVA(Wilcox 2017,WRS2::t1way 用 trimmed means)。

SS uses squared distances, so a single extreme point inflates MS_within and kills F. Inspect boxplots and QQ plots; if needed, switch to Kruskal–Wallis or robust ANOVA (Wilcox 2017, WRS2::t1way uses trimmed means).

p>0.05 ≠ 沒差別

無法拒絕虛無假設不代表組間相等——可能只是 n 太小或變異太大(Type II error)。報告效果量 + 95% CI 才能看出「真的沒差」或「沒功效偵測差距」。等價檢定 (equivalence test) 是更嚴謹做法。

Failing to reject doesn't imply equality — small n or large variance (Type II error) can hide a real difference. Report effect size + 95% CI to tell "truly equal" from "underpowered". For a strong claim of no difference, run an equivalence test.

📝 自我檢測

1. 你比較 4 種藥物濃度的細胞存活率(每組 n=15)。Levene's test p=0.02,殘差 QQ plot 大致直線。最合適的主檢定?

1. You compare cell viability across 4 drug concentrations (n=15/group). Levene's test p=0.02, residual QQ plot roughly linear. The best primary test?

A. 直接做 4 次 t-test 找差距A. Just run six pairwise t-tests
B. 傳統 one-way ANOVA(aov)B. Classical one-way ANOVA (aov)
C. Welch ANOVA(oneway.test,var.equal=FALSE)+ Games–Howell post-hocC. Welch ANOVA (oneway.test, var.equal=FALSE) + Games–Howell
D. Kruskal–WallisD. Kruskal–Wallis

2. 一篇論文寫「ANOVA 顯著 (F=4.2, p=0.01),因此 treatment 比 control 有效」。最大的問題是?

2. A paper writes "ANOVA significant (F=4.2, p=0.01), so treatment beats control". What's the main problem?

A. 應該用 t-test 而不是 ANOVAA. Should use t-test instead
B. F 太小不可信B. F is too small to trust
C. p 應該 <0.001 才有意義C. p should be <0.001 to count
D. Omnibus F 只說「至少一對不同」,不能直接推斷「treatment vs control」這對;需 post-hoc(如 Tukey / Dunnett)D. Omnibus F only says "at least one pair differs"; the specific treatment-vs-control claim needs a post-hoc (Tukey/Dunnett)

3. 你有 6 種 cytokine 處理 vs 同一個 untreated control。最有效率的 post-hoc?

3. You compare 6 cytokine treatments against a single untreated control. Most efficient post-hoc?

A. Tukey HSD(所有 15 配對)A. Tukey HSD (all 15 pairs)
B. Dunnett(只跑 6 個 vs control,功效更高)B. Dunnett (only 6 tests vs control; higher power)
C. Bonferroni 跑全 15 對C. Bonferroni on all 15
D. SchefféD. Scheffé

4. 同一隻老鼠取了 5 個腦切片,3 隻老鼠 × 5 切片 = 15 觀察值放進 ANOVA。問題是?

4. Five brain slices from each of three mice (3×5=15) are fed into ANOVA. The problem is?

A. n=15 太少A. n=15 is too small
B. 應該用 Bonferroni 調整B. Should apply Bonferroni
C. Pseudoreplication:切片不是獨立觀察,獨立單位是 mouse;應用 mixed model 把 mouse 設為 random effectC. Pseudoreplication: slices aren't independent; the unit is mouse — fit a mixed model with mouse as a random effect
D. 應該用 Welch ANOVAD. Should use Welch ANOVA