Step 3: Sampling & CLT — Biostatistics Tutorial

總覽

為什麼 CLT 是生物統計的「核反應爐」？

幾乎每一個你用過的推論統計都依賴中央極限定理（Central Limit Theorem, CLT）：t 檢定為什麼能用？ANOVA 的 F 值為什麼有意義？線性迴歸的 95% CI 為什麼合法？答案都是 CLT——當樣本數 n 夠大時，樣本平均（或迴歸係數、比例、差值）的抽樣分布會接近常態，與原始資料是否常態無關。

更深層的觀念：SE（standard error）與 SD（standard deviation）是兩件完全不同的事。SD 描述「資料散得多開」，SE = SD/√n 描述「我對 mean 的估計有多精確」。Curran-Everett 2008 Adv Physiol Educ 指出，這是論文最常見的混淆——許多人把 SEM 當 SD 寫，誤差條看起來「乾淨」，卻把資料變異藏起來。

Nearly every inferential procedure you've ever used relies on the Central Limit Theorem (CLT): why does the t-test work? Why does the ANOVA F-ratio mean anything? Why is a regression's 95% CI legal? The answer is CLT — when n is large, the sampling distribution of the mean (or of regression coefficients, proportions, or differences) approaches normal, regardless of whether the raw data are normal.

A deeper idea: SE (standard error) and SD (standard deviation) are entirely different things. SD captures how spread the data are; SE = SD/√n captures how precisely you've estimated the mean. Curran-Everett 2008 (Adv Physiol Educ) flagged this as the most common confusion in biology papers — people report SEM as if it described variability, making error bars look tidy while hiding the actual spread.

💡

抽樣分布（sampling distribution）的定義：如果我們無限次地從同一個母體抽 n 個觀察、每次都算一個統計量（例如 mean），這些統計量本身會形成一個分布——這個「想像出來的」分布就是抽樣分布。CLT 告訴我們：當 n 大時，這個分布長得像常態。所有推論統計都是在比較「觀察到的統計量」與「抽樣分布」。 What is a sampling distribution? Imagine drawing n observations from the same population an infinite number of times and computing a statistic (the mean, say) each time. Those statistics themselves form a distribution — that hypothetical distribution is the sampling distribution. CLT tells us that for large n it looks normal. All inferential statistics compare an observed statistic against its sampling distribution.

核心三概念

一、抽樣分布、SE、CLT

🎯

抽樣分布

不是資料的分布——而是「統計量（如 X̄）跨假想重複抽樣後的分布」。每次實驗只看到一個 X̄，但 CLT 告訴我們這個 X̄ 來自一個寬度 σ/√n、中心 μ 的常態分布。
所有 p 值、CI、t 值都是基於這個分布計算。

Not the distribution of the data — but the distribution of a statistic (like X̄) over hypothetical repeated samples. Each experiment gives one X̄, but CLT tells us it comes from a normal distribution with mean μ and width σ/√n.
Every p-value, CI, and t-statistic lives on this distribution.

📏

SE vs SD

SD：資料的離散（越大表示個體差異越大）。
SE = SD/√n：估計的精確度（越小表示越能精準估到 mean）。
n = 100 時 SE ≈ SD/10；n = 10000 時 SE ≈ SD/100——但 SD 本身不變。
論文中報告 mean ± SD（離散）或 mean (95% CI)（精度），不要報 mean ± SEM。

SD: how spread the data are (bigger SD = more variation between individuals).
SE = SD/√n: precision of the estimate (smaller SE = mean estimated more precisely).
At n = 100, SE ≈ SD/10; at n = 10 000, SE ≈ SD/100 — yet SD itself doesn't change.
Report mean ± SD (spread) or mean (95% CI) (precision); do not report mean ± SEM.

🌐

CLT 敘述

設 X₁, …, Xₙ 為 i.i.d.，具有有限均值 μ 與有限變異 σ²，則
(X̄ − μ) / (σ/√n) → N(0, 1)
當 n → ∞（Lindeberg-Lévy CLT, 1922）。實務上 n ≥ 30 通常足夠；但偏態強或尾巴重時，需要更大 n。

Let X₁, …, Xₙ be i.i.d. with finite mean μ and finite variance σ². Then
(X̄ − μ) / (σ/√n) → N(0, 1)
as n → ∞ (Lindeberg-Lévy CLT, 1922). In practice n ≥ 30 often suffices, but heavily skewed or heavy-tailed data need more.

⌜ E(X̄) = μ · Var(X̄) = σ²/n · SE(X̄) = σ/√n · X̄ ~ N(μ, σ²/n) 當 n → ∞ ⌝ 兩個關鍵推論：(1) X̄ 的期望值等於 μ（無偏）；(2) X̄ 的變異隨 n 線性下降——「n 翻 4 倍，SE 減半」就是這條式子。樣本 SE 用 s/√n 估計（s 是樣本 SD）。 ⌜ E(X̄) = μ · Var(X̄) = σ²/n · SE(X̄) = σ/√n · X̄ ~ N(μ, σ²/n) as n → ∞ ⌝ Two key consequences: (1) E(X̄) = μ — the sample mean is unbiased; (2) Var(X̄) shrinks linearly in n — "quadruple n to halve SE" is this formula. The sample SE is estimated as s/√n where s is the sample SD.

互動模擬 ①

CLT 演示

選一個母體分布（均勻 / 指數 / 雙峰 / Cauchy），拖動 n 滑桿。每次從母體抽 n 個觀察、算一個平均，重複 2000 次，繪出「樣本平均的分布」。觀察：
· 母體越偏態，需要越大 n 才能收斂；
· Cauchy 分布沒有定義的 mean / 變異，CLT 不適用——無論 n 多大，樣本平均仍然亂跳。這正是 CLT「有限變異」假設的破口。

Pick a population (uniform / exponential / bimodal / Cauchy) and drag n. We draw n observations, compute the mean, repeat 2 000 times, and plot the sampling distribution. Watch:
· The more skewed the population, the larger n needed for convergence.
· The Cauchy distribution has no defined mean or variance, so CLT fails — sample means keep jumping no matter how big n is. That is the breach in the finite-variance assumption.

母體

樣本數 n 5

↑ 母體分布（單次大樣本，N = 5000）↑ Population (one large sample, N = 5000)

↑ 樣本平均的分布（2000 次重複，n 顆觀察取平均）· 紅虛線 = 理論常態（μ, σ/√n）↑ Sampling distribution of the mean (2000 reps, n per draw) · Red dashed = theoretical Normal(μ, σ/√n)

深入討論

二、CLT 的歷史與假設

CLT 簡史

1733 De Moivre：在《The Doctrine of Chances》第二版證明「擲銅板很多次後，正面比例近似常態」——這是 CLT 的最早版本（二項 → 常態）。
1810 Laplace：推廣為任何 i.i.d. 變數，給出全名「中央極限定理」的雛形。
1922 Lindeberg-Lévy：以現代測度論寫下標準形式，假設只需 i.i.d. + 有限變異。
1901 Lyapunov：放寬到非同分布但獨立的情況（Lyapunov CLT）。

1733 De Moivre: in the second edition of The Doctrine of Chances he proved the proportion of heads in many coin tosses is approximately normal — the earliest CLT (binomial → normal).
1810 Laplace: extended it to any i.i.d. variables, coining what we now call the Central Limit Theorem.
1922 Lindeberg-Lévy: laid out the modern measure-theoretic statement under i.i.d. + finite variance.
1901 Lyapunov: relaxed to independent-but-not-identically-distributed (Lyapunov's CLT).

兩條關鍵假設

(1) i.i.d.（獨立且同分布）：每個觀察彼此獨立，且來自同一個分布。Panel data、time series、cluster sampling 都違反這條——必須用 mixed model 或 GEE。
(2) 有限變異 σ² < ∞：Cauchy 分布（t₁）的變異不存在，CLT 失效。Pareto α < 2 的分布也是。
除此之外，不需要原始資料常態——這是最常被誤解的一點。

(1) i.i.d. (independent, identically distributed): observations independent and from the same population. Panel data, time series, and cluster sampling violate this — use a mixed model or GEE.
(2) Finite variance σ² < ∞: Cauchy (t₁) has no variance, so CLT fails. Pareto with α < 2 likewise.
Otherwise, raw data normality is not required — the most widely misread point.

⚠️

收斂速率 — Berry-Esseen 界（1941–1942）：對於 i.i.d. 且有限三階矩 E|X|³ < ∞ 的情況，|F_n(x) − Φ(x)| ≤ C·ρ/(σ³√n)，其中 ρ = E|X − μ|³，C ≈ 0.4748（Shevtsova 2011 最佳常數）。直觀意義：偏態越大、收斂越慢。指數分布 n = 30 已不錯，但 log-normal σ = 2 可能要 n > 500。 Rate of convergence — Berry-Esseen bound (1941–1942): for i.i.d. data with finite third moment, |F_n(x) − Φ(x)| ≤ C·ρ/(σ³√n), where ρ = E|X − μ|³, C ≈ 0.4748 (Shevtsova 2011, best constant). Intuition: the more skewed the data, the slower CLT kicks in. Exponential is fine at n = 30, but log-normal with σ = 2 may need n > 500.

🚫

CLT 失效的真實案例：(1) Cauchy / 強尾分布——基因表達單細胞「dropout」資料常有重尾，t 檢定 CI 太窄。(2) 非獨立資料——同一隻老鼠的多次測量、同一病人的多次抽血，必須用 mixed model（Step 13）。(3) n 極小且資料極偏——n = 5、log-normal σ = 3，CLT 完全沒收斂。Hall 1992《Bootstrap and Edgeworth Expansion》詳細推導了偏度如何透過 Edgeworth 展開影響 CI 覆蓋率。 Real CLT failures: (1) Cauchy / heavy-tailed — single-cell expression "dropout" data are heavy-tailed; t-CIs are too narrow. (2) Non-independent data — repeated measures on the same mouse or patient need a mixed model (Step 13). (3) Tiny n with extreme skew — n = 5 on log-normal σ = 3 simply does not converge. Hall (1992, Bootstrap and Edgeworth Expansion) derives how skewness leaks through Edgeworth expansions to distort CI coverage.

互動模擬 ②

Bootstrap 演示

當 n 太小或資料太偏，CLT 給出的 t-CI 不可信。Efron 1979 提出 Bootstrap：把手上的 n 筆資料當作「迷你母體」，有放回地重抽 n 筆 B 次，每次算統計量，得到的分布就是經驗抽樣分布。
· Percentile CI：取 bootstrap 統計量的 2.5% 與 97.5% 分位數。
· t-CI：mean ± t·SE/√n，假設常態。
當資料偏態時，percentile CI 通常較準。

When n is small or data are skewed, CLT-based t-CIs can mislead. Efron 1979 introduced the bootstrap: treat your n data points as a mini-population, resample n with replacement B times, compute the statistic each time, and the spread is the empirical sampling distribution.
· Percentile CI: 2.5th and 97.5th quantiles of the bootstrap statistics.
· t-CI: mean ± t·SE/√n, assuming normality.
For skewed data, the percentile CI is usually more honest.

原始 n 15

Bootstrap B 2000

母體偏態 1.0

綠色 = Bootstrap 平均分布 · 藍虛線 = 95% percentile CI · 紅虛線 = t-CI · 黑色 = 觀察到的 meanGreen = bootstrap mean distribution · Blue dashed = 95% percentile CI · Red dashed = t-CI · Black = observed mean

BCa（Bias-Corrected accelerated）CI：DiCiccio & Efron 1996 進一步改良 percentile CI，校正偏態與「加速因子」a。R: boot::boot.ci(b, type="bca")；Python: scipy.stats.bootstrap(method="BCa")。對於中等偏態資料，BCa 比 percentile 更準。 BCa (Bias-Corrected accelerated) CI: DiCiccio & Efron (1996) refined the percentile CI by correcting for skewness and an "acceleration" factor a. R: boot::boot.ci(b, type="bca"); Python: scipy.stats.bootstrap(method="BCa"). For moderately skewed data, BCa beats plain percentile.

決策引導

三、CI 怎麼選？

🌳 信賴區間決策樹

Q1:

資料近似常態，且 n > 20？→ 是 → t-CI：mean ± t_{n−1, 0.975}·s/√n。簡單、精確、論文預設。

Q2:

資料偏態但 n ≥ 30，且無極端 outlier？→ 是 → CLT-based t-CI 仍可用（CLT 收斂中），或對數轉換後再用 t-CI。

Q3:

資料嚴重偏態、n 小（< 20）、或統計量不是 mean（如 median、相關係數）？→ 是 → Bootstrap percentile / BCa CI（Efron-Tibshirani 1993）。

Q4:

比例 (proportion) 資料？→ 是 → Wilson 或 Clopper-Pearson CI（不要用 mean ± 1.96·SE，n 小時覆蓋率差）。

Q5:

有重複測量、cluster、time-series？→ 是 → 不能直接 bootstrap 個別觀察——cluster bootstrap 或直接用 mixed model（Step 13）。

Q6:

抽樣 > 5% 的有限母體？→ 是 → 加上 finite-population correction (FPC) = √((N−n)/(N−1))。

Q1:

Roughly normal data with n > 20? → Yes → t-CI: mean ± t_{n−1, 0.975}·s/√n. Simple, precise, the default for papers.

Q2:

Skewed but n ≥ 30 and no extreme outliers? → Yes → CLT-based t-CI still acceptable (CLT is converging) — or log-transform first.

Q3:

Strong skew, small n (< 20), or statistic ≠ mean (median, correlation)? → Yes → Bootstrap percentile / BCa CI (Efron-Tibshirani 1993).

Q4:

Proportion data? → Yes → Wilson or Clopper-Pearson CI — never mean ± 1.96·SE for small n.

Q5:

Repeated measures, clusters, time series? → Yes → Don't bootstrap individual rows — use cluster bootstrap or a mixed model (Step 13).

Q6:

Sampling > 5% of a finite population? → Yes → Apply the finite-population correction √((N−n)/(N−1)).

對照表

四、四種誤差條怎麼用？

量	意義	公式	何時用	陷阱
SD	資料的離散	s = √(Σ(xᵢ−x̄)²/(n−1))	描述個體差異	不隨 n 變小	Data spread	s = √(Σ(xᵢ−x̄)²/(n−1))	Describe individual variation	Does not shrink with n
SE (SEM)	mean 估計的精確度	SE = s/√n	罕用——通常該用 CI	勿當 SD 報	Precision of the mean estimate	SE = s/√n	Rarely the right report — usually use a CI	Do not report as SD
95% t-CI	mean 的 95% 信賴區間（CLT 推導）	x̄ ± t_{n−1, 0.975}·s/√n	資料近似常態 / n ≥ 20	n 小且偏態時覆蓋率差	95% CI for the mean (CLT-based)	x̄ ± t_{n−1, 0.975}·s/√n	Approx. normal data / n ≥ 20	Poor coverage when n is small and data skewed
Bootstrap CI	經驗抽樣分布的分位數	2.5%, 97.5% 分位數（percentile）；或 BCa	偏態、n 小、統計量複雜	n < 10 時不穩定	Quantiles of empirical sampling distribution	2.5th / 97.5th percentile, or BCa	Skewed data, small n, complex statistics	Unstable when n < 10
Wilson CI (proportion)	比例的 95% CI	(p̂ + z²/2n ± z√(p̂q̂/n + z²/4n²)) / (1+z²/n)	二元結果，n 小或 p 極端	不要用 p̂ ± 1.96·SE	95% CI for a proportion	(p̂ + z²/2n ± z√(p̂q̂/n + z²/4n²)) / (1+z²/n)	Binary outcome, small n or extreme p	Avoid p̂ ± 1.96·SE (Wald)

論文寫法建議（Cumming 2014 Psychol Sci 倡議）：結果段落報 mean (95% CI)；表格 1 / 描述統計報 mean ± SD；圖中的誤差條清楚標示「error bars = 95% CI」或「= SD」。勿用 SEM 當誤差條——讀者無法解釋。 Reporting advice (Cumming 2014 Psychol Sci): in results paragraphs report mean (95% CI); in Table 1 report mean ± SD; on figures clearly state "error bars = 95% CI" or "= SD". Do not use SEM error bars — readers cannot interpret them.

程式碼

五、實作範例

# R: CLT demo + bootstrap CI
library(tidyverse); library(boot)

# --- 1) Simulate CLT: exponential population ---
n  <- 30
sims <- replicate(5000, mean(rexp(n, rate = 1)))
hist(sims, breaks = 40, main = "Sampling distribution of X̄")
qqnorm(sims); qqline(sims, col = "red")

# --- 2) SE vs SD ---
x  <- rnorm(100, mean = 120, sd = 15)
sd_x <- sd(x)                   # data spread (~15)
se_x <- sd_x / sqrt(length(x))   # mean precision (~1.5)

# --- 3) Classic t-CI (CLT-based) ---
ci_t <- t.test(x)$conf.int          # 95% t-CI

# --- 4) Bootstrap percentile + BCa CI ---
b <- boot(x, statistic = function(d, i) mean(d[i]), R = 2000)
boot.ci(b, type = c("perc", "bca"))

# --- 5) Finite-population correction ---
N <- 500; n <- 60
fpc <- sqrt((N - n) / (N - 1))
se_fpc <- (sd_x / sqrt(n)) * fpc

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

rng = np.random.default_rng(0)

# --- 1) Simulate CLT: exponential population ---
n = 30
sims = np.array([rng.exponential(1, n).mean() for _ in range(5000)])
plt.hist(sims, bins=40); plt.title("Sampling distribution of X̄"); plt.show()
stats.probplot(sims, plot=plt); plt.show()

# --- 2) SE vs SD ---
x = rng.normal(120, 15, 100)
sd_x = x.std(ddof=1)
se_x = sd_x / np.sqrt(len(x))

# --- 3) Classic t-CI (CLT-based) ---
ci_t = stats.t.interval(0.95, df=len(x)-1, loc=x.mean(), scale=se_x)

# --- 4) Bootstrap percentile + BCa (scipy ≥ 1.7) ---
res = stats.bootstrap((x,), np.mean,
                       n_resamples=2000, method="BCa",
                       confidence_level=0.95, random_state=rng)
res.confidence_interval         # (low, high)

# --- 5) Finite-population correction ---
N, n = 500, 60
fpc  = np.sqrt((N - n) / (N - 1))
se_fpc = (sd_x / np.sqrt(n)) * fpc

💡

練習：下次跑 t 檢定前，先用 boot::boot.ci() 或 scipy.stats.bootstrap() 對同一份資料做 BCa CI，比較兩者寬度。若兩個 CI 差很多，你的資料對 CLT 來說太偏，應該優先使用 bootstrap 結果，並考慮對數轉換。 Exercise: Before running your next t-test, also compute a BCa CI with boot::boot.ci() or scipy.stats.bootstrap() and compare widths. If they differ noticeably, your data are too skewed for CLT — prefer the bootstrap result and consider a log transform.

常見陷阱

六、六個常見錯誤

❌ SE 不是 SD

n 越大 SEM 越小，誤差條看起來「漂亮」——但這只是估計的精度，不是資料散布。Curran-Everett 2008：報告變異請用 SD 或 IQR；報告精度請用 95% CI。SEM 幾乎沒有獨立用途。

Bigger n shrinks SEM and makes error bars look tidy — but that's estimate precision, not data spread. Curran-Everett 2008: report SD or IQR for spread, 95% CI for precision. SEM has almost no standalone use.

❌ CLT 是關於 mean，不是原始資料

常見錯誤：「我的資料不常態，所以 t 檢定不能用」。CLT 是對樣本平均的陳述——只要 n 夠大、變異有限，X̄ 就近似常態，t 檢定就能用。檢測 raw data normality（Shapiro-Wilk）本身意義有限。

Common mistake: "My data aren't normal, so the t-test isn't valid." CLT is about the sample mean, not the raw data. If n is large enough and the variance is finite, X̄ is approximately normal and t still works. Testing raw-data normality (Shapiro-Wilk) is largely a distraction.

❌ Bootstrap 不是萬靈丹

當 n < 10，bootstrap 重抽的只是同樣 10 個值的不同組合——抽樣分布幾乎沒有資訊。Chernick 2008：n < 10 時 bootstrap CI 不穩定；理想 n ≥ 30。極端情況需用 Bayesian 或精確檢定。

With n < 10, the bootstrap just rearranges the same 10 values — the sampling distribution carries little information. Chernick 2008: bootstrap CIs are unstable when n < 10; n ≥ 30 is ideal. Extreme small-sample problems call for Bayesian or exact methods.

❌ 忽略相依性

同一隻老鼠的 3 個切片不是獨立——把 n 寫成「老鼠數 × 切片數」會誇大有效樣本數，SE 太小、p 值太小。Lazic 2010 BMC Neurosci：使用 mixed model 或 cluster bootstrap，把生物 / 技術重複分開處理。

Three slices from the same mouse are not independent. Counting "n = mice × slices" inflates the effective sample size, shrinks SE, and shrinks p. Lazic 2010 BMC Neurosci: use mixed models or cluster bootstrap to keep biological and technical replication separate.

❌ CLT 不適用 Cauchy

單細胞表達資料、收入、保險賠付都可能是重尾。對於 Pareto α < 2 或 Cauchy，變異不存在 → CLT 失效。應改用 median 與 quantile-based 推論，或 stable distribution。

Single-cell expression, income, insurance payouts can be heavy-tailed. For Pareto with α < 2 or Cauchy, variance does not exist → CLT fails. Use medians and quantile-based inference, or stable distributions.

❌ 忘記 FPC

抽樣全國醫院 500 家中的 100 家（占 20%），不加 finite-population correction (FPC) 會高估 SE。FPC = √((N−n)/(N−1))，當 n/N → 1 時 SE → 0。Cochran 1977《Sampling Techniques》。

Sampling 100 of 500 hospitals (20%) without the finite-population correction overstates SE. FPC = √((N−n)/(N−1)); as n/N → 1, SE → 0. See Cochran 1977 Sampling Techniques.

陷阱：n = 30 不是魔法數字教科書常說「n ≥ 30 就可以用 CLT」。事實上：對均勻 / 近常態母體，n = 10 已經夠；對指數母體，n = 30 大致 OK；對 log-normal σ = 2 的母體，n = 500 仍不夠（Berry-Esseen 邊界給出 |F_n − Φ| ~ ρ/√n，其中 ρ 隨偏態三次方放大）。實務做法：跑一次 simulation 看 X̄ 的 QQ plot，或乾脆用 bootstrap CI 與 t-CI 比較。 Textbooks repeat "n ≥ 30 and CLT is fine." Reality: for uniform / near-normal populations, n = 10 is plenty; for exponential, n = 30 is roughly OK; for log-normal with σ = 2, n = 500 is still not enough (Berry-Esseen gives |F_n − Φ| ~ ρ/√n where ρ scales with skewness cubed). In practice: simulate and inspect the QQ plot of X̄, or simply compare a bootstrap CI to a t-CI.

📝 自我檢測

1. 在報告 100 位病人的收縮壓（SBP）資料時，論文寫「mean ± SEM = 132 ± 1.4 mmHg」。最合適的修正是？

1. A paper reports systolic BP for 100 patients as "mean ± SEM = 132 ± 1.4 mmHg." The best correction is?

A. 把 SEM 改成 SD 但數值不變（132 ± 1.4）A. Keep the number but relabel SEM as SD (132 ± 1.4)

B. 還是用 SEM，因為誤差條看起來小B. Keep SEM because the error bar looks small

C. 改報 mean ± SD（132 ± 14）描述離散，並另外寫 95% CI 描述精度C. Report mean ± SD (132 ± 14) for spread and add a 95% CI for precision

D. 只報 median + IQR，永遠不用 meanD. Always report only median + IQR

2. 同事看到單細胞 RNA-seq 某基因表達量 QQ plot 嚴重偏離 45° 線，建議「不要做 t 檢定」。最合適的回應？

2. A colleague sees a single-cell RNA-seq gene-expression QQ plot deviates badly from the 45° line and says "don't run a t-test." Best response?

A. 同意，t 檢定永遠要求原始資料常態A. Agree — t-tests always require raw-data normality

B. 把所有 outlier 刪除直到資料看起來常態B. Delete outliers until the data look normal

C. CLT 是關於樣本平均的——n 夠大時 X̄ 仍近常態。可進一步用 bootstrap CI 驗證C. CLT is about the sample mean — X̄ is still ~ normal for large n. Confirm with a bootstrap CI

D. 改用 χ² 檢定D. Switch to a chi-square test

3. 你只有 n = 8 個樣本，分布看起來右偏。下列哪個 CI 方法最不可靠？

3. You have n = 8 samples that look right-skewed. Which CI method is least reliable?

A. 對數轉換後再做 t-CIA. Log-transform then t-CI

B. Bootstrap BCa CI（注意 n 偏小，並報告限制）B. Bootstrap BCa CI (note small n and report the limitation)

C. 直接用 mean ± 1.96·SE 假設 CLT 已收斂C. mean ± 1.96·SE assuming CLT has converged

D. 報告 median + 非參數的 Hodges-Lehmann CID. Median + non-parametric Hodges-Lehmann CI

4. 從 500 家醫院中隨機抽 100 家做平均住院費調查。要計算 mean 的 95% CI，應該？

4. From 500 hospitals you randomly sample 100 to estimate mean length-of-stay. To compute a 95% CI for the mean you should?

A. 使用 finite-population correction：SE × √((N−n)/(N−1))A. Apply the finite-population correction: SE × √((N−n)/(N−1))

B. 把 n 從 100 改成 500，因為要代表所有醫院B. Pretend n = 500 because we represent all hospitals

C. 不可能算 CI，因為抽樣不是完全隨機C. No CI is possible because sampling isn't fully random

D. 直接用 SE = s/√n 不需校正D. Use SE = s/√n without correction