95% 信賴區間到底是什麼意思?
頻率派 CI 的正確詮釋是:「如果無限重複實驗,依此方法構造的區間中,有 95% 包含真實參數」。注意主詞是區間而非「真實值」──真實值是固定的,是區間在隨機。「真值落在這個 CI 的機率是 0.95」是錯誤敘述──那是貝氏可信區間 (credible interval) 的口吻。
本章涵蓋:(1) 頻率派覆蓋率詮釋,(2) Wald / score / profile likelihood 三種漸近 CI,(3) 非參數 bootstrap (percentile / basic / studentized / BCa)、參數 bootstrap,(4) CI vs 預測區間 (prediction interval) 的差別。
The frequentist reading: "If we repeated this experiment infinitely, 95% of intervals constructed this way would contain the true parameter." The subject is the interval, not the true value — the truth is fixed, the interval is random. "The probability that the truth lies in this CI is 0.95" is wrong — that statement belongs to a Bayesian credible interval.
This chapter covers (1) the coverage interpretation, (2) Wald / score / profile-likelihood asymptotic CIs, (3) the nonparametric bootstrap with its variants (percentile, basic, studentized, BCa) and the parametric bootstrap, (4) CI versus prediction interval.
一、漸近 CI 三大家族
Wald CI
θ̂ ± z_{1−α/2} · SE(θ̂)。最常見、計算最快──但仰賴抽樣分布近似常態,對比例 p 接近 0 或 1、或樣本量極小時非常不準。修補:對 OR / log-fold-change 等先做對數變換再取 Wald。
θ̂ ± z_{1−α/2} · SE(θ̂). Most common, fastest — but relies on a normal approximation. Breaks for proportions near 0 or 1, and for tiny samples. Fix: for ORs / log-fold-changes, take Wald on the log scale first.
Score / Profile
Score CI:反轉 score 檢定 (Wilson CI 是其特例,比例問題表現極佳)。Profile likelihood CI:對 θ 掃描,記錄輪廓概似 ℓ_p(θ),取 2(ℓ_p(θ̂) − ℓ_p(θ)) ≤ χ²_{1, 0.95} 的範圍。對非線性、不對稱問題遠優於 Wald。confint(glm_fit) 預設就是 profile。
Score CI: inverts the score test (the Wilson interval is the special case for proportions — far better than Wald near 0/1). Profile-likelihood CI: scan θ, take values where 2(ℓ_p(θ̂) − ℓ_p(θ)) ≤ χ²_{1, 0.95}. Better than Wald for nonlinear / asymmetric problems. confint(glm_fit) defaults to profile in R.
二、Bootstrap:當公式失效時的萬用工具
非參數 bootstrap:從原始資料 (n 筆) 以可放回抽 n 筆,計算統計量 θ̂*;重複 B 次,得到 θ̂* 的「抽樣分布近似」。基於此可造 CI:
- Percentile: [θ̂*_{α/2}, θ̂*_{1−α/2}]──最直觀,但對偏態估計有偏。
- Basic (reverse percentile): [2θ̂ − θ̂*_{1−α/2}, 2θ̂ − θ̂*_{α/2}]──修正部分偏誤。
- Studentized:對每次 bootstrap 計算 t* = (θ̂* − θ̂)/SE*,需要 nested bootstrap 估 SE*,但二階準確。
- BCa (bias-corrected and accelerated):修正偏誤與不對稱──通常是預設首選。
參數 bootstrap:從擬合模型 (e.g. NB(μ̂, α̂)) 生成 pseudo-data,再擬合得到 θ̂*。比非參數版本更銳利──若模型對。
Nonparametric bootstrap: resample n observations with replacement from the original n, compute θ̂*, repeat B times to get an empirical sampling distribution. Then build CIs:
- Percentile: [θ̂*_{α/2}, θ̂*_{1−α/2}] — intuitive but biased when θ̂ is skewed.
- Basic (reverse percentile): [2θ̂ − θ̂*_{1−α/2}, 2θ̂ − θ̂*_{α/2}] — corrects part of the bias.
- Studentized: use t* = (θ̂* − θ̂)/SE*; needs a nested bootstrap for SE*, but second-order accurate.
- BCa: bias-corrected and accelerated — usually the recommended default.
Parametric bootstrap: draw pseudo-data from the fitted model (e.g. NB(μ̂, α̂)) and refit. Sharper than nonparametric — provided the model is correct.
100 個 CI 模擬:覆蓋率現場演示
對 N(μ=10, σ=3) 重複抽 100 個樣本 (每次 n 筆),畫出每次的 95% CI。紅色表示該次 CI 沒蓋住真值──理論上長期約 5% 會紅。改變 n 觀察區間如何收縮但覆蓋率保持 ≈ 95%。
Draw 100 samples (n each) from N(μ=10, σ=3) and plot each 95% CI. Red = the CI missed the truth — about 5% in the long run. Vary n to see how intervals shrink while coverage stays ≈ 95%.
X:模擬編號;Y:CI 範圍。紅線:真值。
三、CI 方法選擇
| 場景 | 推薦 | 原因 | |||
|---|---|---|---|---|---|
| 大樣本均值 / 迴歸係數 | Wald | 快速、足夠準 | Large-sample mean / regression β | Wald | fast and accurate |
| 比例接近 0 或 1 | Wilson / Clopper-Pearson | Wald 嚴重低估邊界區間 | Proportion near 0 or 1 | Wilson / Clopper-Pearson | Wald collapses at boundaries |
| GLM / 非線性參數 | Profile likelihood | 不對稱形狀比 Wald 對 | GLM / nonlinear param | Profile likelihood | captures asymmetry |
| 複雜統計量 (中位數、GSEA、網路度) | BCa bootstrap | 無需解析式 | Complex statistics (median, GSEA, network metric) | BCa bootstrap | no closed-form needed |
| 時間序列 / 相關樣本 | Block bootstrap | 保留依賴結構 | Time series / correlated | Block bootstrap | preserves dependence |
| 下一筆觀測值的不確定性 | Prediction interval | CI 只描述參數,不含 ε | Uncertainty of next obs | Prediction interval | CI omits ε |
實作:CI 與 bootstrap
# --- R --- CI 三種寫法 + bootstrap # 1) Wald CI (lm / glm) fit <- lm(y ~ x, data=d) confint(fit) # lm: Wald (基於 t-分布) library(MASS); g <- glm(y ~ x, family=poisson, data=d) confint(g) # glm: profile-likelihood (預設) confint.default(g) # glm: Wald (顯式) # 2) Proportion CI library(binom) binom.confint(3, 20, methods=c("asymptotic", "wilson", "exact")) # 3) Bootstrap CI library(boot) boot_fn <- function(data, idx) median(data[idx]) b <- boot(x, boot_fn, R=2000) boot.ci(b, type=c("perc", "basic", "bca")) # 4) 覆蓋率驗證 (一次性檢查 CI 是否真正 95%) covered <- replicate(2000, { x <- rnorm(20, mean=10, sd=3) ci <- t.test(x)$conf.int ci[1] < 10 && ci[2] > 10 }) mean(covered) # 應接近 0.95
# --- Python --- CI 三種寫法 + bootstrap import numpy as np from scipy import stats import statsmodels.api as sm from statsmodels.stats.proportion import proportion_confint # 1) Wald CI (OLS / GLM) ols = sm.OLS(y, sm.add_constant(X)).fit() ols.conf_int(alpha=0.05) # Wald-t # 2) Proportion CI proportion_confint(3, 20, method="wilson") proportion_confint(3, 20, method="beta") # Clopper-Pearson # 3) Bootstrap CI res = stats.bootstrap((x,), np.median, n_resamples=2000, method="BCa", confidence_level=0.95) res.confidence_interval # 4) 覆蓋率驗證 covered = [] for _ in range(2000): x = np.random.normal(10, 3, 20) lo, hi = stats.t.interval(0.95, len(x)-1, np.mean(x), stats.sem(x)) covered.append(lo < 10 < hi) np.mean(covered) # ≈ 0.95
📝 自我檢測
1. 對 95% CI [3.1, 5.2] 的正確頻率派詮釋是?
1. The correct frequentist reading of a 95% CI [3.1, 5.2] is?
2. BCa 相對於 percentile bootstrap 主要修正什麼?
2. BCa improves over percentile bootstrap mainly by correcting what?
3. 為什麼 naive bootstrap 對「樣本最大值 max(X)」會失效?
3. Why does naive bootstrap fail for the sample max?