Step 6: Chi-Square & Categorical Data — Biostatistics Tutorial

總覽

為什麼類別資料需要自己的一章？

連續資料用 t / ANOVA，但臨床、流病、遺傳學的核心問題往往是「比例 / 計數」：用藥組 vs 對照組的死亡比例、基因型 AA/Aa/aa 的疾病頻率、Mendel 9:3:3:1 的子代分配。對這些資料用 t 檢定不只是「精度差」——是本質錯誤，因為 t 檢定假設常態 + 等變異，但二元變數的變異與平均直接綁定（Var = p(1−p)）。

類別資料的故事從 Karl Pearson 1900 開始：他在 Philosophical Magazine 提出「χ² 適合度檢定」，把「觀察 vs 期望」轉化為一個可加的距離總和。1922 年 R. A. Fisher 修正了自由度（df = (r−1)(c−1) 而非 rc−1），同年提出 Fisher's exact test 處理小樣本。1947 McNemar 解決「同一受試者前後」的配對問題，1959 Mantel-Haenszel 處理「分層 2×2 表」。整個類別資料分析的骨架由 1900-1959 這 60 年構築完成。

Continuous data go through t-tests and ANOVA, but the central questions in clinical, epidemiologic, and genetic research are often about proportions and counts: mortality in treatment vs control, disease frequency among AA / Aa / aa genotypes, Mendel's 9:3:3:1 offspring split. Running a t-test on data like these isn't merely "imprecise" — it's conceptually wrong: t-tests assume normality and constant variance, but for a binary variable the variance is locked to the mean (Var = p(1−p)).

The story starts with Karl Pearson 1900 in Philosophical Magazine, who turned "observed vs expected" into a single additive distance — the χ² goodness-of-fit test. R. A. Fisher 1922 fixed the degrees of freedom (df = (r−1)(c−1), not rc−1) and, in the same year, introduced Fisher's exact test for small samples. McNemar (1947) handled before/after pairing in the same subjects; Mantel-Haenszel (1959) handled stratified 2×2 tables. The whole skeleton of categorical-data analysis was built between 1900 and 1959.

💡

一句話記住：類別資料分析的核心是「觀察次數 O」與「在虛無假設下的期望次數 E」之間的距離。卡方統計量 χ² = Σ(O−E)²/E 把這個距離標準化、可加，並在大樣本下逼近卡方分布。Agresti（2018）Categorical Data Analysis 是這個領域的標準教科書。 One-line summary: categorical-data analysis hinges on the distance between observed counts O and expected counts E under H₀. The chi-square statistic χ² = Σ(O−E)²/E standardizes and aggregates that distance, and is approximately chi-square distributed in large samples. Agresti's Categorical Data Analysis (3rd ed., 2018) is the standard textbook.

核心概念

一、列聯表、期望值、卡方統計量

把資料攤成 r × c 的列聯表（contingency table）：行是一個類別變數（如 treatment / control），列是另一個（如 event / no event）。在「兩變數獨立」的虛無假設下，每一格的期望次數等於邊際機率乘積 × N：

Lay the data out as an r × c contingency table: rows are one categorical variable (treatment / control), columns are another (event / no event). Under the null hypothesis of independence, the expected count in each cell equals the product of the marginal probabilities times N:

⌜ E_ij = (row_i total × col_j total) / N · χ² = Σ_i,j (O_ij − E_ij)² / E_ij · df = (r−1)(c−1) ⌝ Pearson 1900 提出卡方統計量；Fisher 1922 修正了 r×c 表的自由度公式（早期 Pearson 把 df 寫成 rc−1）。對 2×2 表，df = 1。 ⌜ E_ij = (row_i total × col_j total) / N · χ² = Σ_i,j (O_ij − E_ij)² / E_ij · df = (r−1)(c−1) ⌝ Pearson (1900) introduced χ²; Fisher (1922) corrected the degrees of freedom for an r×c table (Pearson originally wrote rc−1). For a 2×2 table, df = 1.

🔢

獨立性

兩個類別變數是否相關？例如：藥物 × 結果、基因型 × 疾病、吸菸 × 肺癌。H₀：兩變數獨立；E 由邊際分布計算。

Are two categorical variables associated? Example: drug × outcome, genotype × disease, smoking × lung cancer. H₀: the two variables are independent; E is computed from the marginals.

🎯

適合度

觀察的計數是否符合某個理論分布？例如：Mendel 9:3:3:1、Hardy-Weinberg p², 2pq, q²、均勻分布（骰子是否公平）。E 由理論機率 × N。df = k − 1 − m（m = 估計的參數個數）。

Do observed counts match a theoretical distribution? Example: Mendel 9:3:3:1, Hardy-Weinberg p², 2pq, q², uniform (is the die fair?). E = theoretical probability × N. df = k − 1 − m (m = number of parameters estimated from the data).

🔁

同質性

多個獨立樣本是否來自同一母體？數學形式與獨立性檢定相同（同一 χ² 公式），差別只在取樣設計：homogeneity 是行邊際固定（從每組抽固定 n），independence 是總和 N 固定。Agresti 2018 Ch.2。

Do multiple independent samples come from the same population? Mathematically identical to the independence test (same χ² formula), the difference is purely in sampling design: homogeneity fixes the row margins (sample fixed n from each group), independence fixes only the total N. Agresti 2018 Ch.2.

直覺：O = E 時 χ² = 0；O 偏離 E 越遠，χ² 越大。為什麼除以 E？因為「次數 100 與期望 100 差 5」遠比「次數 5 與期望 5 差 5」溫和——除以 E 等於把絕對差距標準化成「相對於期望規模」。這也是 Poisson 計數資料的 variance ≈ mean 性質的直接反映。 Intuition: χ² = 0 when O = E; χ² grows as O drifts from E. Why divide by E? "Observed 100 vs expected 100 differing by 5" is far milder than "observed 5 vs expected 5 differing by 5" — dividing by E rescales the absolute gap by the expected magnitude. This mirrors the Poisson property variance ≈ mean for count data.

互動模擬 ①

2×2 列聯表計算器

輸入四格次數——左上 a（treatment + event）、右上 b（treatment + no event）、左下 c（control + event）、右下 d（control + no event）。下面同步顯示期望值 E、χ² 統計量（含 Yates 校正版本）、p 值、Fisher's exact p、以及三大效應量 OR / RR / RD 連同 95% 信賴區間。最小期望值 < 5 時，介面會跳警告，建議切換到 Fisher。

Enter four counts — top-left a (treatment + event), top-right b (treatment + no event), bottom-left c (control + event), bottom-right d (control + no event). The panel shows expected counts E, χ² statistic (with and without Yates), p value, Fisher's exact p, and the three effect sizes OR / RR / RD with 95% CIs. If any expected cell < 5, a warning appears recommending Fisher's exact.

a (處理+, 事件+) 40

b (處理+, 事件−) 60

c (控制−, 事件+) 20

d (控制−, 事件−) 80

深色＝觀察 O · 淺色＝期望 EDark = Observed O · Light = Expected E

進階家族

二、四個必須認得的變體

🎲 Fisher's exact

當任一期望次數 < 5，卡方近似失準（Cochran 1954 經典準則：所有 E ≥ 5，或 ≥ 80% 的格子 E ≥ 5）。Fisher 用超幾何分布（hypergeometric）枚舉所有「邊際固定」下比觀察更極端的表格。

常用於：小樣本臨床試驗、稀有突變的 GWAS 子集、單細胞 cluster vs marker overlap（fisher.test 是 Seurat FindAllMarkers 的選項之一）。

When any expected cell < 5, the chi-square approximation breaks down (Cochran 1954 rule: all E ≥ 5, or ≥ 80% of cells have E ≥ 5). Fisher uses the hypergeometric distribution to enumerate every table at least as extreme as the observed one, conditional on fixed margins.

Used for: small clinical trials, rare-variant GWAS subsets, single-cell cluster-vs-marker overlap (fisher.test is one of the options in Seurat's FindAllMarkers).

⚙️ Yates correction

2×2 表：每個 |O−E| 先扣 0.5 再平方。動機：χ² 是連續分布，但計數是整數，校正可以「平滑掉」這個誤差。

現代評價：過度保守。Camilli & Hopkins (1979)、Camilli 1995 Psychol Bull、Sokal & Rohlf (2012) 的模擬都顯示 Yates 校正讓 Type I error 顯著低於名義 α。R 預設 correct = TRUE——強烈建議改成 FALSE，或直接用 Fisher's exact。

For 2×2 tables: subtract 0.5 from each |O−E| before squaring. Motivation: χ² is continuous but counts are integers; the correction "smooths" the discreteness gap.

Modern verdict: over-conservative. Camilli & Hopkins (1979), Camilli 1995 Psychol Bull, and Sokal & Rohlf (2012) all show Yates depresses Type I error far below nominal α. R defaults correct = TRUE — set it to FALSE, or just switch to Fisher's exact.

🔄 McNemar — 配對二元

同一受試者前後的二元結果，或配對病例對照。表的格子是 (前+/後+, 前+/後−, 前−/後+, 前−/後−)；只看「不一致」的兩格 b 與 c：

χ²_McN = (b − c)² / (b + c) · df = 1

例：100 人服藥前後高血壓狀態。直接用一般卡方會把配對結構視為獨立——錯。Bennett 2017 BMJ：「配對資料用 unpaired test 就是 SD/√n 的浪費。」

Binary outcomes before vs after on the same subjects, or matched case-control. The table is (pre+/post+, pre+/post−, pre−/post+, pre−/post−); only the two discordant cells b and c matter:

χ²_McN = (b − c)² / (b + c) · df = 1

Example: hypertension status in 100 patients before vs after a drug. Running a vanilla chi-square treats paired data as independent — wrong. Bennett (2017 BMJ): "Using an unpaired test on paired data throws away SD/√n of power."

🧭 CMH — 分層 2×2

把 2×2 表按「混淆變數」（confounder, 如年齡層、性別、研究中心）分層，再合併估計共同 OR。可同時：(1) 控制混淆，(2) 用 Breslow-Day 檢定 OR 是否跨層恆定（若 OR 隨層改變 → 有交互作用，CMH 不合適）。

例：多中心臨床試驗、流病分層分析。處理 Simpson's paradox 的標準工具。

Stratify 2×2 tables by a confounder (age band, sex, study site) and pool a common OR. CMH lets you (1) control for the confounder and (2) test, via Breslow-Day, whether the OR is constant across strata (if OR varies → interaction, CMH inappropriate).

Example: multi-center trials, stratified epidemiology. The standard antidote to Simpson's paradox.

⚠️

Cochran 1954 規則細節：對 r×c 表（r ≥ 2 或 c ≥ 2），規則是「沒有任何 E < 1，且不超過 20% 的格子 E < 5」。對 2×2 表規則嚴格——所有 4 格的 E 都需 ≥ 5。違反時：r×c 可考慮合併類別或 Fisher-Freeman-Halton exact；2×2 直接 Fisher's exact。 The Cochran 1954 rule in detail: for r×c tables, "no cell with E < 1, and at most 20% of cells with E < 5." For 2×2, the rule is strict — all four cells must have E ≥ 5. Otherwise: r×c → collapse categories or use Fisher-Freeman-Halton exact; 2×2 → go straight to Fisher's exact.

互動模擬 ②

OR vs RR vs RD 比較器

選定一個「相對風險 RR」（如 2 倍風險），然後拖動基準風險 p₀從 0.01 到 0.5。觀察：當 p₀ 小（< 10%）時，OR ≈ RR；但當 p₀ 變大，OR 急遽膨脹，遠超過 RR——這就是「common outcome bias of OR」。流病 / 臨床期刊建議：罕見結果（< 10%）報 OR 可，常見結果（≥ 10%）請改報 RR 或 RD（Zhang & Yu 1998 JAMA、Pearce 2024 Int J Epidemiol）。

Pick a "relative risk" (say RR = 2) and drag baseline risk p₀ from 0.01 to 0.5. Notice: at low p₀ (< 10%), OR ≈ RR; but as p₀ grows, OR balloons far past RR — the famous "common outcome bias of OR". Epidemiology and clinical journals advise: rare outcomes (< 10%) can be reported as OR, common outcomes (≥ 10%) should be reported as RR or RD (Zhang & Yu 1998 JAMA, Pearce 2024 IJE).

目標 RR 2.0

橫軸＝基準風險 p₀ · 紅＝OR · 藍＝RR · 綠＝RDx = baseline risk p₀ · red = OR · blue = RR · green = RD

陷阱：把 OR 唸成「風險是 X 倍」記者常把「OR = 3.0」翻成「風險增加 3 倍」——只有結果很罕見時才接近正確。當 p₀ = 30%，OR = 3 對應的 RR 大約只有 1.85。這個誤譯出現在無數新聞與低品質期刊中。Greenland 1987 Am J Epidemiol、Schmidt & Kohlmann 2008 Int J Public Health 都點名警告。 Journalists routinely read "OR = 3.0" as "three-fold increase in risk" — only true when the outcome is rare. At p₀ = 30%, OR = 3 corresponds to RR ≈ 1.85. This mistranslation pollutes news coverage and low-tier journals; Greenland (1987 AJE) and Schmidt & Kohlmann (2008 IJPH) both call it out.

決策引導

三、怎麼選？

🌳 類別資料檢定決策樹

Q1:

是「同一受試者前後 / 配對」的二元資料？→ 是 → McNemar test（看 b 與 c 兩個不一致格）。

Q2:

有需要控制的混淆變數（年齡層、性別、中心）？→ 是 → Cochran-Mantel-Haenszel；先用 Breslow-Day 確認跨層 OR 無顯著異質。

Q3:

所有期望次數 E ≥ 5（2×2）或 ≥ 80% 格子 E ≥ 5（r×c）？→ 是 → Pearson χ²（2×2 別加 Yates 校正）。

Q4:

期望次數不夠？→ 是 → 2×2 用 Fisher's exact；r×c 用 Fisher-Freeman-Halton exact（R: fisher.test(simulate.p.value=TRUE)）。

Q5:

不只想要 p 值，還要「校正多個 covariates」？→ 是 → 跳到 Step 9 logistic regression（χ² / CMH 的多變量版本）。

Q6:

是「觀察 vs 理論分布」？（Mendel、HWE、骰子）→ 是 → χ² goodness-of-fit，df = k − 1 − (估計參數數)。

Q1:

Paired binary data (same subject pre/post, matched case-control)? → Yes → McNemar test (looks only at the two discordant cells b and c).

Q2:

Need to control a confounder (age band, sex, site)? → Yes → Cochran-Mantel-Haenszel; check the Breslow-Day homogeneity test first.

Q3:

All expected E ≥ 5 (2×2) or ≥ 80% of cells with E ≥ 5 (r×c)? → Yes → Pearson χ² (and skip Yates on 2×2).

Q4:

Expected counts too small? → Yes → 2×2 → Fisher's exact; r×c → Fisher-Freeman-Halton (R: fisher.test(simulate.p.value=TRUE)).

Q5:

Not just a p-value — need to adjust for multiple covariates? → Yes → go to Step 9 logistic regression (the multivariate analogue of χ² / CMH).

Q6:

Observed vs theoretical distribution (Mendel, HWE, fair die)? → Yes → χ² goodness-of-fit, df = k − 1 − (parameters estimated).

效應量比較

四、OR / RR / RD 的特性

效應量	公式	範圍	適合設計	加 / 乘	陷阱
OR Odds Ratio	ad / bc	(0, ∞)	case-control、logistic 回歸、罕見結果	乘法（log(OR) 可加）	常見結果時嚴重高估 RR	case-control, logistic regression, rare outcomes	multiplicative (log OR additive)	overstates RR when outcome is common
RR Relative Risk	[a/(a+b)] / [c/(c+d)]	(0, ∞)	cohort、RCT、流病追蹤	乘法（log RR 可加）	case-control 不能直接算（無分母）	cohort, RCT, prospective epi	multiplicative (log RR additive)	undefined in case-control (no denominator)
RD Risk Difference	a/(a+b) − c/(c+d)	(−1, 1)	RCT、絕對風險溝通、NNT 計算	加法（直接相減）	基準風險很小或很大時不夠敏感	RCT, absolute-risk communication, NNT	additive (direct subtraction)	insensitive at very low / very high p₀
NNT NNT	1 / \|RD\|	[1, ∞)	RCT 臨床決策溝通	（衍生量）	RD 跨 0 時 NNT 無意義；建議報 RD 與 95% CI	RCT clinical decision communication	(derived)	undefined when RD spans 0; report RD + 95% CI instead

💡

95% CI 公式（log 尺度）：OR 與 RR 的分布在原尺度高度偏態，要在 log 尺度算 CI 再 exp 回去。
· log(OR) ± 1.96 × √(1/a + 1/b + 1/c + 1/d)
· log(RR) ± 1.96 × √[(1/a − 1/(a+b)) + (1/c − 1/(c+d))]
· RD ± 1.96 × √[p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂]（直接在原尺度）。
R: epitools::oddsratio(tab)、epitools::riskratio(tab) 都會給 Wald / Taylor 兩種 CI。 95% CIs (on the log scale): OR and RR are heavily skewed on the raw scale; compute CIs on the log scale, then exponentiate.
· log(OR) ± 1.96 × √(1/a + 1/b + 1/c + 1/d)
· log(RR) ± 1.96 × √[(1/a − 1/(a+b)) + (1/c − 1/(c+d))]
· RD ± 1.96 × √[p₁(1−p₁)/n₁ + p₂(1−p₂)/n₂] (raw scale).
R: epitools::oddsratio(tab) and epitools::riskratio(tab) output both Wald and Taylor CIs.

適合度應用

五、遺傳學的卡方傳統

Mendel 9:3:3:1

F₂ 觀察數（皺/黃, 圓/黃, 皺/綠, 圓/綠）= 32, 101, 108, 315；總 N = 556。理論期望比 9:3:3:1：

E = (556 × 9/16, 556 × 3/16, 556 × 3/16, 556 × 1/16) = 312.75, 104.25, 104.25, 34.75。

χ² = Σ(O−E)²/E ≈ 0.47, df = 3, p ≈ 0.93——資料與理論一致。Fisher 1936 著名爭議：Mendel 的卡方總和「太合適」（過多接近期望），暗示資料可能被修飾。這是「太好的擬合」反成質疑的經典案例。

F₂ counts (wrinkled/yellow, round/yellow, wrinkled/green, round/green) = 32, 101, 108, 315; total N = 556. Expected under 9:3:3:1:

E = (556 × 9/16, 556 × 3/16, 556 × 3/16, 556 × 1/16) = 312.75, 104.25, 104.25, 34.75.

χ² ≈ 0.47, df = 3, p ≈ 0.93 — observed agrees with theory. The famous Fisher 1936 reanalysis: Mendel's aggregated χ² is "too good a fit" (suspiciously close to expectation), suggesting the data may have been polished. A classic case where the fit is too good to be true.

HWE 檢定

SNP 三個基因型 AA, Aa, aa 在 HWE 下期望比例為 p², 2pq, q²（p = 等位頻率 A）。從觀察數估 p̂，計算 E，再算 χ²。df = 1（k = 3 個類別 − 1 − 1 個估計參數 p̂）。

GWAS QC 標準：對照組 HWE p < 1e−6 通常剔除（疾病組偏離 HWE 可能是真訊號）。Wigginton et al. 2005 AJHG 提供 exact HWE 檢定（避免低 MAF 時卡方近似失準）——PLINK 的 --hardy 預設用 exact。

For an SNP with genotypes AA, Aa, aa, HWE expects proportions p², 2pq, q² (p = A allele frequency). Estimate p̂ from the data, compute E, then χ². df = 1 (k = 3 genotypes − 1 − 1 estimated parameter).

GWAS QC convention: SNPs with control HWE p < 1e−6 are typically dropped (case-group deviation from HWE can be real signal). Wigginton et al. 2005 AJHG introduced an exact HWE test (avoids χ² approximation failure at low MAF) — PLINK's --hardy uses exact by default.

💡

GWAS 中的卡方傳統：每個 SNP 的「allelic test」就是一個 2×2 χ² 表（allele × case/control）。基因型測試（genotypic test）是 2×3，trend test（Cochran-Armitage 1955）是 2×3 加上劑量效應假設（df = 1）。對 SNP × disease 跑百萬個卡方時，記得 Bonferroni / FDR（見 Step 12）。 The χ² heritage in GWAS: the "allelic test" for each SNP is a 2×2 χ² (allele × case/control). The genotypic test is 2×3; the Cochran-Armitage trend test (1955) is a 2×3 with a dosage assumption (df = 1). When running millions of χ² across SNP × disease, never forget Bonferroni / FDR (see Step 12).

程式碼

六、實作範例

# R: chi-square family + effect sizes
library(epitools)   # oddsratio(), riskratio(), riskdiff()
library(vcd)        # mosaic plots, assocstats()

# --- 2x2 table: drug vs outcome ---
tab <- matrix(c(40, 60, 20, 80),
              nrow = 2, byrow = TRUE,
              dimnames = list(treat = c("drug", "placebo"),
                              event = c("yes", "no")))

# Pearson chi-square — turn OFF Yates by default
chisq.test(tab, correct = FALSE)
chisq.test(tab)$expected      # inspect E_ij

# Fisher's exact (any expected < 5, or just safer at small n)
fisher.test(tab)

# Effect sizes with 95% CI
epitools::oddsratio(tab)$measure       # OR + Wald CI
epitools::riskratio(tab)$measure       # RR + Wald CI
epitools::riskdiff(tab)               # RD + CI

# --- McNemar: paired binary (before/after) ---
paired <- matrix(c(30, 12, 25, 33), 2,
                 dimnames = list(pre = c("+", "-"),
                                 post = c("+", "-")))
mcnemar.test(paired, correct = FALSE)

# --- Cochran-Mantel-Haenszel: stratify by age band ---
data(UCBAdmissions)             # classic Simpson's paradox
mantelhaen.test(UCBAdmissions)  # pools OR across departments

# --- Goodness-of-fit: Mendel 9:3:3:1 ---
obs <- c(315, 108, 101, 32)
chisq.test(obs, p = c(9, 3, 3, 1) / 16)

# --- HWE exact via HardyWeinberg pkg ---
# HardyWeinberg::HWExact(c(AA=298, Aa=489, aa=213))

import numpy as np
import pandas as pd
from scipy import stats
from statsmodels.stats.contingency_tables import Table2x2, mcnemar, StratifiedTable

# --- 2x2 table ---
tab = np.array([[40, 60],
                [20, 80]])

# Pearson chi-square (scipy's correction=False)
chi2, p, dof, expected = stats.chi2_contingency(tab, correction=False)

# Fisher's exact (one-sided / two-sided)
stats.fisher_exact(tab, alternative="two-sided")

# Effect sizes + 95% CI from statsmodels
t = Table2x2(tab)
t.odds_ratio(), t.oddsratio_confint()
t.riskratio(), t.riskratio_confint()
# risk difference: t.summary() returns everything
print(t.summary())

# --- McNemar paired ---
paired = np.array([[30, 12], [25, 33]])
mcnemar(paired, exact=False, correction=False)

# --- Cochran-Mantel-Haenszel: 3D array (layers x 2 x 2) ---
strata = np.array([[[12, 88], [5, 95]],
                   [[28, 72], [15, 85]],
                   [[35, 65], [20, 80]]])
st = StratifiedTable(strata)
st.test_null_odds()              # CMH null test
st.test_equal_odds()             # Breslow-Day homogeneity
st.oddsratio_pooled, st.oddsratio_pooled_confint()

# --- Goodness-of-fit: Mendel 9:3:3:1 ---
obs = np.array([315, 108, 101, 32])
exp = obs.sum() * np.array([9, 3, 3, 1]) / 16
stats.chisquare(obs, exp)

💡

關鍵小細節：R 的 chisq.test() 對 2×2 表預設 correct = TRUE（Yates）——大多數情況請設成 FALSE。Python 的 scipy.stats.chi2_contingency 預設 correction = True，同樣請改成 False。兩個語言都同一個陷阱。 One detail that bites everyone: R's chisq.test() defaults to correct = TRUE on 2×2 (Yates) — set to FALSE in most cases. Python's scipy.stats.chi2_contingency defaults to correction = True — same fix. Both languages, same trap.

常見陷阱

七、論文最常見的六個錯誤

❌ Yates 預設誤用

R 的 chisq.test(2×2) 預設加 Yates，導致 p 值系統性高估。Camilli (1995) 與 Sokal-Rohlf (2012) 都建議關閉。實務：寫 chisq.test(tab, correct = FALSE)，或乾脆用 Fisher's exact。

R's chisq.test(2×2) applies Yates by default, systematically inflating p. Camilli (1995) and Sokal-Rohlf (2012) both recommend turning it off. Practical fix: chisq.test(tab, correct = FALSE), or just use Fisher's exact.

❌ OR 誤譯為 RR

當結果常見（> 10–20%）時，OR 大幅高估 RR。Greenland (1987 AJE)、Zhang & Yu (1998 JAMA)。臨床溝通請用 RR 或 RD + NNT，並在 Methods 報告兩者。

When the outcome is common (> 10–20%), OR overstates RR substantially. Greenland (1987 AJE), Zhang & Yu (1998 JAMA). For clinical communication, report RR or RD + NNT and disclose both in Methods.

❌ E < 5 仍跑 χ²

違反 Cochran 1954 的近似條件 → 卡方分布近似失準（特別在邊緣 p 值，0.01–0.10 區）。R / Python 多半會顯示「Chi-squared approximation may be incorrect」警告——別忽略，改用 Fisher's exact。

Violating Cochran (1954) → poor χ² approximation, especially near the borderline p (0.01–0.10). R / Python both warn "Chi-squared approximation may be incorrect" — don't ignore it; switch to Fisher's exact.

❌ Simpson 悖論

Simpson 1951 JRSS-B：合併資料的關係方向，可能在分層後逆轉。經典例：UC Berkeley 1973 招生（合併看女性入學率較低，分系後反而較高）。解方：CMH + Breslow-Day 或 logistic 回歸把混淆變數放進模型。

Simpson 1951 JRSS-B: the direction of association in pooled data can flip after stratification. Classic case: UC Berkeley 1973 admissions (lower female admission overall, higher within most departments). Fix: CMH + Breslow-Day, or logistic regression with confounders as covariates.

❌ 配對資料用獨立檢定

同一受試者前後 / 配對病例對照→必須用 McNemar。把配對視為獨立會嚴重低估配對相關，浪費功效。Bennett 2017 BMJ。

Same subject before/after, or matched case-control → use McNemar. Treating paired as independent underuses the within-pair correlation and loses power. Bennett (2017 BMJ).

❌ 多重比較未修正

GWAS、scRNA marker、藥物篩選——對每個變數跑卡方，p < 0.05 早已被「多重檢定」吞噬。請看 Step 12，至少跑 Bonferroni（最嚴）或 BH-FDR（最常用）。

GWAS, scRNA marker tests, drug screens — running a chi-square per variable means p < 0.05 is consumed by multiplicity long before you notice. See Step 12; at minimum apply Bonferroni (strictest) or BH-FDR (most common).

📝 自我檢測

1. 你的 2×2 表 4 格分別是 3, 27, 1, 29，總 N = 60。下列何者最合適？

1. Your 2×2 table has cells 3, 27, 1, 29 with N = 60. Best choice?

A. Pearson χ² 加 Yates 校正A. Pearson χ² with Yates correction

B. Pearson χ² 不加校正B. Pearson χ² without correction

C. Fisher's exact test——因為 E 有格子 < 5C. Fisher's exact — because expected counts include a cell < 5

D. 用 t 檢定比較兩組比例D. A t-test on the two proportions

2. RCT 中事件發生率：治療組 30%，對照組 50%。下列敘述何者錯誤？

2. RCT event rates: treatment 30%, control 50%. Which statement is WRONG?

A. RR = 0.6（30%/50%）A. RR = 0.6 (30%/50%)

B. RD = −20% → NNT = 5B. RD = −20% → NNT = 5

C. OR ≈ 0.43；且 OR 等於 RR，所以 OR 也是 0.6C. OR ≈ 0.43; and since OR equals RR, OR is also 0.6

D. 結果常見（> 10%）時，OR 與 RR 差距大，論文應同時報告兩者D. With a common outcome (> 10%), OR diverges from RR; report both

3. 你想檢驗某藥物使 100 位高血壓患者「治療前 / 後」的控制狀態變化。最合適的檢定是？

3. You want to test whether a drug changes BP control status (yes/no) before vs after in 100 patients. Best test?

A. Pearson χ² 獨立性檢定A. Pearson χ² independence test

B. McNemar 檢定，只看不一致兩格 b, cB. McNemar test — only the discordant cells b, c

C. Fisher's exact testC. Fisher's exact test

D. 兩個獨立樣本 t 檢定D. Two-sample independent t-test

4. 三家醫院的藥物試驗合併看顯示 OR = 1.5（不利於藥物），但每家醫院分別看 OR < 1（藥物有效）。這是什麼現象？應該用什麼方法分析？

4. Pooled across three hospitals, OR = 1.5 against the drug; within each hospital, OR < 1 (drug helps). What is happening? What analysis should you run?

A. 樣本不足——加入更多醫院A. Underpowered — add more hospitals

B. 統計誤差——直接相信合併結果B. Statistical noise — trust the pooled estimate

C. Simpson's paradox——使用 Cochran-Mantel-Haenszel 分層分析或 logistic 回歸校正混淆變數C. Simpson's paradox — use Cochran-Mantel-Haenszel or logistic regression to adjust the confounder

D. 改用 t 檢定D. Use a t-test instead