STEP 6 / 13

卡方檢定與類別資料 (Chi-Square)

類別資料的核心工具——獨立性 / 適合度 / 同質性、何時改用 Fisher's exact、OR vs RR vs RD 的本質差別。

The toolkit for categorical data — independence / goodness-of-fit / homogeneity, when to switch to Fisher's exact, and the real distinction between OR, RR, and RD.

為什麼類別資料需要自己的一章?

連續資料用 t / ANOVA,但臨床、流病、遺傳學的核心問題往往是「比例 / 計數」:用藥組 vs 對照組的死亡比例、基因型 AA/Aa/aa 的疾病頻率、Mendel 9:3:3:1 的子代分配。對這些資料用 t 檢定不只是「精度差」——是本質錯誤,因為 t 檢定假設常態 + 等變異,但二元變數的變異與平均直接綁定(Var = p(1−p))。

類別資料的故事從 Karl Pearson 1900 開始:他在 Philosophical Magazine 提出「χ² 適合度檢定」,把「觀察 vs 期望」轉化為一個可加的距離總和。1922 年 R. A. Fisher 修正了自由度(df = (r−1)(c−1) 而非 rc−1),同年提出 Fisher's exact test 處理小樣本。1947 McNemar 解決「同一受試者前後」的配對問題,1959 Mantel-Haenszel 處理「分層 2×2 表」。整個類別資料分析的骨架由 1900-1959 這 60 年構築完成。

Continuous data go through t-tests and ANOVA, but the central questions in clinical, epidemiologic, and genetic research are often about proportions and counts: mortality in treatment vs control, disease frequency among AA / Aa / aa genotypes, Mendel's 9:3:3:1 offspring split. Running a t-test on data like these isn't merely "imprecise" — it's conceptually wrong: t-tests assume normality and constant variance, but for a binary variable the variance is locked to the mean (Var = p(1−p)).

The story starts with Karl Pearson 1900 in Philosophical Magazine, who turned "observed vs expected" into a single additive distance — the χ² goodness-of-fit test. R. A. Fisher 1922 fixed the degrees of freedom (df = (r−1)(c−1), not rc−1) and, in the same year, introduced Fisher's exact test for small samples. McNemar (1947) handled before/after pairing in the same subjects; Mantel-Haenszel (1959) handled stratified 2×2 tables. The whole skeleton of categorical-data analysis was built between 1900 and 1959.

💡
一句話記住:類別資料分析的核心是「觀察次數 O」與「在虛無假設下的期望次數 E」之間的距離。卡方統計量 χ² = Σ(O−E)²/E 把這個距離標準化、可加,並在大樣本下逼近卡方分布。Agresti(2018)Categorical Data Analysis 是這個領域的標準教科書。 One-line summary: categorical-data analysis hinges on the distance between observed counts O and expected counts E under H₀. The chi-square statistic χ² = Σ(O−E)²/E standardizes and aggregates that distance, and is approximately chi-square distributed in large samples. Agresti's Categorical Data Analysis (3rd ed., 2018) is the standard textbook.

一、列聯表、期望值、卡方統計量

把資料攤成 r × c 的列聯表(contingency table):行是一個類別變數(如 treatment / control),列是另一個(如 event / no event)。在「兩變數獨立」的虛無假設下,每一格的期望次數等於邊際機率乘積 × N

Lay the data out as an r × c contingency table: rows are one categorical variable (treatment / control), columns are another (event / no event). Under the null hypothesis of independence, the expected count in each cell equals the product of the marginal probabilities times N:

Eij = (rowi total × colj total) / N   ·   χ² = Σi,j (OijEij)² / Eij   ·   df = (r−1)(c−1) ⌝ Pearson 1900 提出卡方統計量;Fisher 1922 修正了 r×c 表的自由度公式(早期 Pearson 把 df 寫成 rc−1)。對 2×2 表,df = 1。 Eij = (rowi total × colj total) / N   ·   χ² = Σi,j (OijEij)² / Eij   ·   df = (r−1)(c−1) ⌝ Pearson (1900) introduced χ²; Fisher (1922) corrected the degrees of freedom for an r×c table (Pearson originally wrote rc−1). For a 2×2 table, df = 1.
🔢

獨立性

兩個類別變數是否相關?例如:藥物 × 結果基因型 × 疾病吸菸 × 肺癌H₀:兩變數獨立;E 由邊際分布計算。

Are two categorical variables associated? Example: drug × outcome, genotype × disease, smoking × lung cancer. H₀: the two variables are independent; E is computed from the marginals.

🎯

適合度

觀察的計數是否符合某個理論分布?例如:Mendel 9:3:3:1Hardy-Weinberg p², 2pq, q²均勻分布(骰子是否公平)。E 由理論機率 × N。df = k − 1 − m(m = 估計的參數個數)。

Do observed counts match a theoretical distribution? Example: Mendel 9:3:3:1, Hardy-Weinberg p², 2pq, q², uniform (is the die fair?). E = theoretical probability × N. df = k − 1 − m (m = number of parameters estimated from the data).

🔁

同質性

多個獨立樣本是否來自同一母體?數學形式與獨立性檢定相同(同一 χ² 公式),差別只在取樣設計:homogeneity 是行邊際固定(從每組抽固定 n),independence 是總和 N 固定。Agresti 2018 Ch.2。

Do multiple independent samples come from the same population? Mathematically identical to the independence test (same χ² formula), the difference is purely in sampling design: homogeneity fixes the row margins (sample fixed n from each group), independence fixes only the total N. Agresti 2018 Ch.2.

直覺:O = E 時 χ² = 0;O 偏離 E 越遠,χ² 越大。為什麼除以 E?因為「次數 100 與期望 100 差 5」遠比「次數 5 與期望 5 差 5」溫和——除以 E 等於把絕對差距標準化成「相對於期望規模」。這也是 Poisson 計數資料的 variance ≈ mean 性質的直接反映。 Intuition: χ² = 0 when O = E; χ² grows as O drifts from E. Why divide by E? "Observed 100 vs expected 100 differing by 5" is far milder than "observed 5 vs expected 5 differing by 5" — dividing by E rescales the absolute gap by the expected magnitude. This mirrors the Poisson property variance ≈ mean for count data.

2×2 列聯表計算器

輸入四格次數——左上 a(treatment + event)、右上 b(treatment + no event)、左下 c(control + event)、右下 d(control + no event)。下面同步顯示期望值 Eχ² 統計量(含 Yates 校正版本)、p 值Fisher's exact p、以及三大效應量 OR / RR / RD 連同 95% 信賴區間。最小期望值 < 5 時,介面會跳警告,建議切換到 Fisher。

Enter four counts — top-left a (treatment + event), top-right b (treatment + no event), bottom-left c (control + event), bottom-right d (control + no event). The panel shows expected counts E, χ² statistic (with and without Yates), p value, Fisher's exact p, and the three effect sizes OR / RR / RD with 95% CIs. If any expected cell < 5, a warning appears recommending Fisher's exact.

深色=觀察 O · 淺色=期望 EDark = Observed O · Light = Expected E

二、四個必須認得的變體

🎲 Fisher's exact

當任一期望次數 < 5,卡方近似失準(Cochran 1954 經典準則:所有 E ≥ 5,或 ≥ 80% 的格子 E ≥ 5)。Fisher 用超幾何分布(hypergeometric)枚舉所有「邊際固定」下比觀察更極端的表格。

常用於:小樣本臨床試驗、稀有突變的 GWAS 子集、單細胞 cluster vs marker overlap(fisher.test 是 Seurat FindAllMarkers 的選項之一)。

When any expected cell < 5, the chi-square approximation breaks down (Cochran 1954 rule: all E ≥ 5, or ≥ 80% of cells have E ≥ 5). Fisher uses the hypergeometric distribution to enumerate every table at least as extreme as the observed one, conditional on fixed margins.

Used for: small clinical trials, rare-variant GWAS subsets, single-cell cluster-vs-marker overlap (fisher.test is one of the options in Seurat's FindAllMarkers).

⚙️ Yates correction

2×2 表:每個 |O−E| 先扣 0.5 再平方。動機:χ² 是連續分布,但計數是整數,校正可以「平滑掉」這個誤差。

現代評價:過度保守。Camilli & Hopkins (1979)、Camilli 1995 Psychol Bull、Sokal & Rohlf (2012) 的模擬都顯示 Yates 校正讓 Type I error 顯著低於名義 α。R 預設 correct = TRUE——強烈建議改成 FALSE,或直接用 Fisher's exact。

For 2×2 tables: subtract 0.5 from each |O−E| before squaring. Motivation: χ² is continuous but counts are integers; the correction "smooths" the discreteness gap.

Modern verdict: over-conservative. Camilli & Hopkins (1979), Camilli 1995 Psychol Bull, and Sokal & Rohlf (2012) all show Yates depresses Type I error far below nominal α. R defaults correct = TRUE — set it to FALSE, or just switch to Fisher's exact.

🔄 McNemar — 配對二元

同一受試者前後的二元結果,或配對病例對照。表的格子是 (前+/後+, 前+/後−, 前−/後+, 前−/後−);只看「不一致」的兩格 b 與 c:

χ²McN = (b − c)² / (b + c) · df = 1

例:100 人服藥前後高血壓狀態。直接用一般卡方會把配對結構視為獨立——錯。Bennett 2017 BMJ:「配對資料用 unpaired test 就是 SD/√n 的浪費。」

Binary outcomes before vs after on the same subjects, or matched case-control. The table is (pre+/post+, pre+/post−, pre−/post+, pre−/post−); only the two discordant cells b and c matter:

χ²McN = (b − c)² / (b + c) · df = 1

Example: hypertension status in 100 patients before vs after a drug. Running a vanilla chi-square treats paired data as independent — wrong. Bennett (2017 BMJ): "Using an unpaired test on paired data throws away SD/√n of power."

🧭 CMH — 分層 2×2

把 2×2 表按「混淆變數」(confounder, 如年齡層、性別、研究中心)分層,再合併估計共同 OR。可同時:(1) 控制混淆,(2) 用 Breslow-Day 檢定 OR 是否跨層恆定(若 OR 隨層改變 → 有交互作用,CMH 不合適)。

例:多中心臨床試驗、流病分層分析。處理 Simpson's paradox 的標準工具

Stratify 2×2 tables by a confounder (age band, sex, study site) and pool a common OR. CMH lets you (1) control for the confounder and (2) test, via Breslow-Day, whether the OR is constant across strata (if OR varies → interaction, CMH inappropriate).

Example: multi-center trials, stratified epidemiology. The standard antidote to Simpson's paradox.

⚠️
Cochran 1954 規則細節:對 r×c 表(r ≥ 2 或 c ≥ 2),規則是「沒有任何 E < 1,且不超過 20% 的格子 E < 5」。對 2×2 表規則嚴格——所有 4 格的 E 都需 ≥ 5。違反時:r×c 可考慮合併類別或 Fisher-Freeman-Halton exact;2×2 直接 Fisher's exact。 The Cochran 1954 rule in detail: for r×c tables, "no cell with E < 1, and at most 20% of cells with E < 5." For 2×2, the rule is strict — all four cells must have E ≥ 5. Otherwise: r×c → collapse categories or use Fisher-Freeman-Halton exact; 2×2 → go straight to Fisher's exact.

OR vs RR vs RD 比較器

選定一個「相對風險 RR」(如 2 倍風險),然後拖動基準風險 p₀從 0.01 到 0.5。觀察:當 p₀ 小(< 10%)時,OR ≈ RR;但當 p₀ 變大,OR 急遽膨脹,遠超過 RR——這就是「common outcome bias of OR」。流病 / 臨床期刊建議:罕見結果(< 10%)報 OR 可,常見結果(≥ 10%)請改報 RR 或 RD(Zhang & Yu 1998 JAMA、Pearce 2024 Int J Epidemiol)。

Pick a "relative risk" (say RR = 2) and drag baseline risk p₀ from 0.01 to 0.5. Notice: at low p₀ (< 10%), OR ≈ RR; but as p₀ grows, OR balloons far past RR — the famous "common outcome bias of OR". Epidemiology and clinical journals advise: rare outcomes (< 10%) can be reported as OR, common outcomes (≥ 10%) should be reported as RR or RD (Zhang & Yu 1998 JAMA, Pearce 2024 IJE).

橫軸=基準風險 p₀ · 紅=OR · 藍=RR · 綠=RDx = baseline risk p₀ · red = OR · blue = RR · green = RD

陷阱:把 OR 唸成「風險是 X 倍」 記者常把「OR = 3.0」翻成「風險增加 3 倍」——只有結果很罕見時才接近正確。當 p₀ = 30%,OR = 3 對應的 RR 大約只有 1.85。這個誤譯出現在無數新聞與低品質期刊中。Greenland 1987 Am J Epidemiol、Schmidt & Kohlmann 2008 Int J Public Health 都點名警告。 Journalists routinely read "OR = 3.0" as "three-fold increase in risk" — only true when the outcome is rare. At p₀ = 30%, OR = 3 corresponds to RR ≈ 1.85. This mistranslation pollutes news coverage and low-tier journals; Greenland (1987 AJE) and Schmidt & Kohlmann (2008 IJPH) both call it out.

三、怎麼選?

🌳 類別資料檢定決策樹

Q1:
是「同一受試者前後 / 配對」的二元資料?→ 是 → McNemar test(看 b 與 c 兩個不一致格)。
Q2:
有需要控制的混淆變數(年齡層、性別、中心)?→ 是 → Cochran-Mantel-Haenszel;先用 Breslow-Day 確認跨層 OR 無顯著異質。
Q3:
所有期望次數 E ≥ 5(2×2)或 ≥ 80% 格子 E ≥ 5(r×c)?→ 是 → Pearson χ²(2×2 別加 Yates 校正)。
Q4:
期望次數不夠?→ 是 → 2×2 用 Fisher's exact;r×c 用 Fisher-Freeman-Halton exact(R: fisher.test(simulate.p.value=TRUE))。
Q5:
不只想要 p 值,還要「校正多個 covariates」?→ 是 → 跳到 Step 9 logistic regression(χ² / CMH 的多變量版本)。
Q6:
是「觀察 vs 理論分布」?(Mendel、HWE、骰子)→ 是 → χ² goodness-of-fit,df = k − 1 − (估計參數數)。
Q1:
Paired binary data (same subject pre/post, matched case-control)? → Yes → McNemar test (looks only at the two discordant cells b and c).
Q2:
Need to control a confounder (age band, sex, site)? → Yes → Cochran-Mantel-Haenszel; check the Breslow-Day homogeneity test first.
Q3:
All expected E ≥ 5 (2×2) or ≥ 80% of cells with E ≥ 5 (r×c)? → Yes → Pearson χ² (and skip Yates on 2×2).
Q4:
Expected counts too small? → Yes → 2×2 → Fisher's exact; r×c → Fisher-Freeman-Halton (R: fisher.test(simulate.p.value=TRUE)).
Q5:
Not just a p-value — need to adjust for multiple covariates? → Yes → go to Step 9 logistic regression (the multivariate analogue of χ² / CMH).
Q6:
Observed vs theoretical distribution (Mendel, HWE, fair die)? → Yes → χ² goodness-of-fit, df = k − 1 − (parameters estimated).

四、OR / RR / RD 的特性

效應量 公式 範圍 適合設計 加 / 乘 陷阱
OR Odds Ratioad / bc(0, ∞) case-control、logistic 回歸、罕見結果 乘法(log(OR) 可加) 常見結果時嚴重高估 RR case-control, logistic regression, rare outcomes multiplicative (log OR additive) overstates RR when outcome is common
RR Relative Risk[a/(a+b)] / [c/(c+d)](0, ∞) cohort、RCT、流病追蹤 乘法(log RR 可加) case-control 不能直接算(無分母) cohort, RCT, prospective epi multiplicative (log RR additive) undefined in case-control (no denominator)
RD Risk Differencea/(a+b) − c/(c+d)(−1, 1) RCT、絕對風險溝通、NNT 計算 加法(直接相減) 基準風險很小或很大時不夠敏感 RCT, absolute-risk communication, NNT additive (direct subtraction) insensitive at very low / very high p₀
NNT NNT1 / |RD|[1, ∞) RCT 臨床決策溝通 (衍生量) RD 跨 0 時 NNT 無意義;建議報 RD 與 95% CI RCT clinical decision communication (derived) undefined when RD spans 0; report RD + 95% CI instead
💡
95% CI 公式(log 尺度):OR 與 RR 的分布在原尺度高度偏態,要在 log 尺度算 CI 再 exp 回去。
· log(OR) ± 1.96 × √(1/a + 1/b + 1/c + 1/d)
· log(RR) ± 1.96 × √[(1/a − 1/(a+b)) + (1/c − 1/(c+d))]
· RD ± 1.96 × √[p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂](直接在原尺度)。
R: epitools::oddsratio(tab)epitools::riskratio(tab) 都會給 Wald / Taylor 兩種 CI。
95% CIs (on the log scale): OR and RR are heavily skewed on the raw scale; compute CIs on the log scale, then exponentiate.
· log(OR) ± 1.96 × √(1/a + 1/b + 1/c + 1/d)
· log(RR) ± 1.96 × √[(1/a − 1/(a+b)) + (1/c − 1/(c+d))]
· RD ± 1.96 × √[p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂] (raw scale).
R: epitools::oddsratio(tab) and epitools::riskratio(tab) output both Wald and Taylor CIs.

五、遺傳學的卡方傳統

Mendel 9:3:3:1

F₂ 觀察數(皺/黃, 圓/黃, 皺/綠, 圓/綠)= 32, 101, 108, 315;總 N = 556。理論期望比 9:3:3:1:

E = (556 × 9/16, 556 × 3/16, 556 × 3/16, 556 × 1/16) = 312.75, 104.25, 104.25, 34.75。

χ² = Σ(O−E)²/E ≈ 0.47, df = 3, p ≈ 0.93——資料與理論一致。Fisher 1936 著名爭議:Mendel 的卡方總和「太合適」(過多接近期望),暗示資料可能被修飾。這是「太好的擬合」反成質疑的經典案例。

F₂ counts (wrinkled/yellow, round/yellow, wrinkled/green, round/green) = 32, 101, 108, 315; total N = 556. Expected under 9:3:3:1:

E = (556 × 9/16, 556 × 3/16, 556 × 3/16, 556 × 1/16) = 312.75, 104.25, 104.25, 34.75.

χ² ≈ 0.47, df = 3, p ≈ 0.93 — observed agrees with theory. The famous Fisher 1936 reanalysis: Mendel's aggregated χ² is "too good a fit" (suspiciously close to expectation), suggesting the data may have been polished. A classic case where the fit is too good to be true.

HWE 檢定

SNP 三個基因型 AA, Aa, aa 在 HWE 下期望比例為 p², 2pq, q²(p = 等位頻率 A)。從觀察數估 p̂,計算 E,再算 χ²。df = 1(k = 3 個類別 − 1 − 1 個估計參數 p̂)。

GWAS QC 標準:對照組 HWE p < 1e−6 通常剔除(疾病組偏離 HWE 可能是真訊號)。Wigginton et al. 2005 AJHG 提供 exact HWE 檢定(避免低 MAF 時卡方近似失準)——PLINK 的 --hardy 預設用 exact。

For an SNP with genotypes AA, Aa, aa, HWE expects proportions p², 2pq, q² (p = A allele frequency). Estimate p̂ from the data, compute E, then χ². df = 1 (k = 3 genotypes − 1 − 1 estimated parameter).

GWAS QC convention: SNPs with control HWE p < 1e−6 are typically dropped (case-group deviation from HWE can be real signal). Wigginton et al. 2005 AJHG introduced an exact HWE test (avoids χ² approximation failure at low MAF) — PLINK's --hardy uses exact by default.

💡
GWAS 中的卡方傳統:每個 SNP 的「allelic test」就是一個 2×2 χ² 表(allele × case/control)。基因型測試(genotypic test)是 2×3,trend test(Cochran-Armitage 1955)是 2×3 加上劑量效應假設(df = 1)。對 SNP × disease 跑百萬個卡方時,記得 Bonferroni / FDR(見 Step 12)。 The χ² heritage in GWAS: the "allelic test" for each SNP is a 2×2 χ² (allele × case/control). The genotypic test is 2×3; the Cochran-Armitage trend test (1955) is a 2×3 with a dosage assumption (df = 1). When running millions of χ² across SNP × disease, never forget Bonferroni / FDR (see Step 12).

六、實作範例

# R: chi-square family + effect sizes
library(epitools)   # oddsratio(), riskratio(), riskdiff()
library(vcd)        # mosaic plots, assocstats()

# --- 2x2 table: drug vs outcome ---
tab <- matrix(c(40, 60, 20, 80),
              nrow = 2, byrow = TRUE,
              dimnames = list(treat = c("drug", "placebo"),
                              event = c("yes", "no")))

# Pearson chi-square — turn OFF Yates by default
chisq.test(tab, correct = FALSE)
chisq.test(tab)$expected      # inspect E_ij

# Fisher's exact (any expected < 5, or just safer at small n)
fisher.test(tab)

# Effect sizes with 95% CI
epitools::oddsratio(tab)$measure       # OR + Wald CI
epitools::riskratio(tab)$measure       # RR + Wald CI
epitools::riskdiff(tab)               # RD + CI

# --- McNemar: paired binary (before/after) ---
paired <- matrix(c(30, 12, 25, 33), 2,
                 dimnames = list(pre = c("+", "-"),
                                 post = c("+", "-")))
mcnemar.test(paired, correct = FALSE)

# --- Cochran-Mantel-Haenszel: stratify by age band ---
data(UCBAdmissions)             # classic Simpson's paradox
mantelhaen.test(UCBAdmissions)  # pools OR across departments

# --- Goodness-of-fit: Mendel 9:3:3:1 ---
obs <- c(315, 108, 101, 32)
chisq.test(obs, p = c(9, 3, 3, 1) / 16)

# --- HWE exact via HardyWeinberg pkg ---
# HardyWeinberg::HWExact(c(AA=298, Aa=489, aa=213))
import numpy as np
import pandas as pd
from scipy import stats
from statsmodels.stats.contingency_tables import Table2x2, mcnemar, StratifiedTable

# --- 2x2 table ---
tab = np.array([[40, 60],
                [20, 80]])

# Pearson chi-square (scipy's correction=False)
chi2, p, dof, expected = stats.chi2_contingency(tab, correction=False)

# Fisher's exact (one-sided / two-sided)
stats.fisher_exact(tab, alternative="two-sided")

# Effect sizes + 95% CI from statsmodels
t = Table2x2(tab)
t.odds_ratio(), t.oddsratio_confint()
t.riskratio(), t.riskratio_confint()
# risk difference: t.summary() returns everything
print(t.summary())

# --- McNemar paired ---
paired = np.array([[30, 12], [25, 33]])
mcnemar(paired, exact=False, correction=False)

# --- Cochran-Mantel-Haenszel: 3D array (layers x 2 x 2) ---
strata = np.array([[[12, 88], [5, 95]],
                   [[28, 72], [15, 85]],
                   [[35, 65], [20, 80]]])
st = StratifiedTable(strata)
st.test_null_odds()              # CMH null test
st.test_equal_odds()             # Breslow-Day homogeneity
st.oddsratio_pooled, st.oddsratio_pooled_confint()

# --- Goodness-of-fit: Mendel 9:3:3:1 ---
obs = np.array([315, 108, 101, 32])
exp = obs.sum() * np.array([9, 3, 3, 1]) / 16
stats.chisquare(obs, exp)
💡
關鍵小細節:R 的 chisq.test() 對 2×2 表預設 correct = TRUE(Yates)——大多數情況請設成 FALSE。Python 的 scipy.stats.chi2_contingency 預設 correction = True,同樣請改成 False。兩個語言都同一個陷阱。 One detail that bites everyone: R's chisq.test() defaults to correct = TRUE on 2×2 (Yates) — set to FALSE in most cases. Python's scipy.stats.chi2_contingency defaults to correction = True — same fix. Both languages, same trap.

七、論文最常見的六個錯誤

Yates 預設誤用

R 的 chisq.test(2×2) 預設加 Yates,導致 p 值系統性高估。Camilli (1995) 與 Sokal-Rohlf (2012) 都建議關閉。實務:寫 chisq.test(tab, correct = FALSE),或乾脆用 Fisher's exact。

R's chisq.test(2×2) applies Yates by default, systematically inflating p. Camilli (1995) and Sokal-Rohlf (2012) both recommend turning it off. Practical fix: chisq.test(tab, correct = FALSE), or just use Fisher's exact.

OR 誤譯為 RR

當結果常見(> 10–20%)時,OR 大幅高估 RR。Greenland (1987 AJE)、Zhang & Yu (1998 JAMA)。臨床溝通請用 RR 或 RD + NNT,並在 Methods 報告兩者。

When the outcome is common (> 10–20%), OR overstates RR substantially. Greenland (1987 AJE), Zhang & Yu (1998 JAMA). For clinical communication, report RR or RD + NNT and disclose both in Methods.

E < 5 仍跑 χ²

違反 Cochran 1954 的近似條件 → 卡方分布近似失準(特別在邊緣 p 值,0.01–0.10 區)。R / Python 多半會顯示「Chi-squared approximation may be incorrect」警告——別忽略,改用 Fisher's exact。

Violating Cochran (1954) → poor χ² approximation, especially near the borderline p (0.01–0.10). R / Python both warn "Chi-squared approximation may be incorrect" — don't ignore it; switch to Fisher's exact.

Simpson 悖論

Simpson 1951 JRSS-B:合併資料的關係方向,可能在分層後逆轉。經典例:UC Berkeley 1973 招生(合併看女性入學率較低,分系後反而較高)。解方:CMH + Breslow-Day 或 logistic 回歸把混淆變數放進模型。

Simpson 1951 JRSS-B: the direction of association in pooled data can flip after stratification. Classic case: UC Berkeley 1973 admissions (lower female admission overall, higher within most departments). Fix: CMH + Breslow-Day, or logistic regression with confounders as covariates.

配對資料用獨立檢定

同一受試者前後 / 配對病例對照→必須用 McNemar。把配對視為獨立會嚴重低估配對相關,浪費功效。Bennett 2017 BMJ。

Same subject before/after, or matched case-control → use McNemar. Treating paired as independent underuses the within-pair correlation and loses power. Bennett (2017 BMJ).

多重比較未修正

GWAS、scRNA marker、藥物篩選——對每個變數跑卡方,p < 0.05 早已被「多重檢定」吞噬。請看 Step 12,至少跑 Bonferroni(最嚴)或 BH-FDR(最常用)。

GWAS, scRNA marker tests, drug screens — running a chi-square per variable means p < 0.05 is consumed by multiplicity long before you notice. See Step 12; at minimum apply Bonferroni (strictest) or BH-FDR (most common).

📝 自我檢測

1. 你的 2×2 表 4 格分別是 3, 27, 1, 29,總 N = 60。下列何者最合適?

1. Your 2×2 table has cells 3, 27, 1, 29 with N = 60. Best choice?

A. Pearson χ² 加 Yates 校正A. Pearson χ² with Yates correction
B. Pearson χ² 不加校正B. Pearson χ² without correction
C. Fisher's exact test——因為 E 有格子 < 5C. Fisher's exact — because expected counts include a cell < 5
D. 用 t 檢定比較兩組比例D. A t-test on the two proportions

2. RCT 中事件發生率:治療組 30%,對照組 50%。下列敘述何者錯誤?

2. RCT event rates: treatment 30%, control 50%. Which statement is WRONG?

A. RR = 0.6(30%/50%)A. RR = 0.6 (30%/50%)
B. RD = −20% → NNT = 5B. RD = −20% → NNT = 5
C. OR ≈ 0.43;且 OR 等於 RR,所以 OR 也是 0.6C. OR ≈ 0.43; and since OR equals RR, OR is also 0.6
D. 結果常見(> 10%)時,OR 與 RR 差距大,論文應同時報告兩者D. With a common outcome (> 10%), OR diverges from RR; report both

3. 你想檢驗某藥物使 100 位高血壓患者「治療前 / 後」的控制狀態變化。最合適的檢定是?

3. You want to test whether a drug changes BP control status (yes/no) before vs after in 100 patients. Best test?

A. Pearson χ² 獨立性檢定A. Pearson χ² independence test
B. McNemar 檢定,只看不一致兩格 b, cB. McNemar test — only the discordant cells b, c
C. Fisher's exact testC. Fisher's exact test
D. 兩個獨立樣本 t 檢定D. Two-sample independent t-test

4. 三家醫院的藥物試驗合併看顯示 OR = 1.5(不利於藥物),但每家醫院分別看 OR < 1(藥物有效)。這是什麼現象?應該用什麼方法分析?

4. Pooled across three hospitals, OR = 1.5 against the drug; within each hospital, OR < 1 (drug helps). What is happening? What analysis should you run?

A. 樣本不足——加入更多醫院A. Underpowered — add more hospitals
B. 統計誤差——直接相信合併結果B. Statistical noise — trust the pooled estimate
C. Simpson's paradox——使用 Cochran-Mantel-Haenszel 分層分析或 logistic 回歸校正混淆變數C. Simpson's paradox — use Cochran-Mantel-Haenszel or logistic regression to adjust the confounder
D. 改用 t 檢定D. Use a t-test instead