Step 3: Confidence Intervals & Bootstrap — Statistical Inference Tutorial

概覽

95% 信賴區間到底是什麼意思？

頻率派 CI 的正確詮釋是：「如果無限重複實驗，依此方法構造的區間中，有 95% 包含真實參數」。注意主詞是區間而非「真實值」──真實值是固定的，是區間在隨機。「真值落在這個 CI 的機率是 0.95」是錯誤敘述──那是貝氏可信區間 (credible interval) 的口吻。

本章涵蓋：(1) 頻率派覆蓋率詮釋，(2) Wald / score / profile likelihood 三種漸近 CI，(3) 非參數 bootstrap (percentile / basic / studentized / BCa)、參數 bootstrap，(4) CI vs 預測區間 (prediction interval) 的差別。

The frequentist reading: "If we repeated this experiment infinitely, 95% of intervals constructed this way would contain the true parameter." The subject is the interval, not the true value — the truth is fixed, the interval is random. "The probability that the truth lies in this CI is 0.95" is wrong — that statement belongs to a Bayesian credible interval.

This chapter covers (1) the coverage interpretation, (2) Wald / score / profile-likelihood asymptotic CIs, (3) the nonparametric bootstrap with its variants (percentile, basic, studentized, BCa) and the parametric bootstrap, (4) CI versus prediction interval.

💡

核心區別：CI 描述「參數的位置」的不確定性；prediction interval 描述「下一個觀測值」的不確定性──後者必然更寬，因為要把抽樣變異加上觀測雜訊。 Key distinction: a CI describes uncertainty about a parameter; a prediction interval describes uncertainty about the next observation — necessarily wider because it adds observation noise on top of sampling variance.

核心概念

一、漸近 CI 三大家族

📏

Wald CI

θ̂ ± z_{1−α/2} · SE(θ̂)。最常見、計算最快──但仰賴抽樣分布近似常態，對比例 p 接近 0 或 1、或樣本量極小時非常不準。修補：對 OR / log-fold-change 等先做對數變換再取 Wald。

θ̂ ± z_{1−α/2} · SE(θ̂). Most common, fastest — but relies on a normal approximation. Breaks for proportions near 0 or 1, and for tiny samples. Fix: for ORs / log-fold-changes, take Wald on the log scale first.

📐

Score / Profile

Score CI：反轉 score 檢定 (Wilson CI 是其特例，比例問題表現極佳)。Profile likelihood CI：對 θ 掃描，記錄輪廓概似 ℓ_p(θ)，取 2(ℓ_p(θ̂) − ℓ_p(θ)) ≤ χ²_{1, 0.95} 的範圍。對非線性、不對稱問題遠優於 Wald。confint(glm_fit) 預設就是 profile。

Score CI: inverts the score test (the Wilson interval is the special case for proportions — far better than Wald near 0/1). Profile-likelihood CI: scan θ, take values where 2(ℓ_p(θ̂) − ℓ_p(θ)) ≤ χ²_{1, 0.95}. Better than Wald for nonlinear / asymmetric problems. confint(glm_fit) defaults to profile in R.

二、Bootstrap：當公式失效時的萬用工具

非參數 bootstrap：從原始資料 (n 筆) 以可放回抽 n 筆，計算統計量 θ̂*；重複 B 次，得到 θ̂* 的「抽樣分布近似」。基於此可造 CI：

Percentile： [θ̂*_{α/2}, θ̂*_{1−α/2}]──最直觀，但對偏態估計有偏。
Basic (reverse percentile)： [2θ̂ − θ̂*_{1−α/2}, 2θ̂ − θ̂*_{α/2}]──修正部分偏誤。
Studentized：對每次 bootstrap 計算 t* = (θ̂* − θ̂)/SE*，需要 nested bootstrap 估 SE*，但二階準確。
BCa (bias-corrected and accelerated)：修正偏誤與不對稱──通常是預設首選。

參數 bootstrap：從擬合模型 (e.g. NB(μ̂, α̂)) 生成 pseudo-data，再擬合得到 θ̂*。比非參數版本更銳利──若模型對。

Nonparametric bootstrap: resample n observations with replacement from the original n, compute θ̂*, repeat B times to get an empirical sampling distribution. Then build CIs:

Percentile: [θ̂*_{α/2}, θ̂*_{1−α/2}] — intuitive but biased when θ̂ is skewed.
Basic (reverse percentile): [2θ̂ − θ̂*_{1−α/2}, 2θ̂ − θ̂*_{α/2}] — corrects part of the bias.
Studentized: use t* = (θ̂* − θ̂)/SE*; needs a nested bootstrap for SE*, but second-order accurate.
BCa: bias-corrected and accelerated — usually the recommended default.

Parametric bootstrap: draw pseudo-data from the fitted model (e.g. NB(μ̂, α̂)) and refit. Sharper than nonparametric — provided the model is correct.

⚠️

Bootstrap 的失效場景：① 極值統計 (min, max, 上分位數)──naive bootstrap 漸近偏誤不消失；② 依賴資料 (時間序列、序列計數)──須用 block bootstrap；③ 邊界參數 (σ² = 0)──分布退化。

Where bootstrap fails: ① extreme statistics (min, max, upper quantiles) — naive bootstrap is asymptotically biased; ② dependent data (time series, sequences) — use a block bootstrap; ③ boundary parameters (σ² = 0) — the distribution degenerates.

互動模擬

100 個 CI 模擬：覆蓋率現場演示

對 N(μ=10, σ=3) 重複抽 100 個樣本 (每次 n 筆)，畫出每次的 95% CI。紅色表示該次 CI 沒蓋住真值──理論上長期約 5% 會紅。改變 n 觀察區間如何收縮但覆蓋率保持 ≈ 95%。

Draw 100 samples (n each) from N(μ=10, σ=3) and plot each 95% CI. Red = the CI missed the truth — about 5% in the long run. Vary n to see how intervals shrink while coverage stays ≈ 95%.

— %width ≈ —n = 20

樣本數 n 20

X：模擬編號；Y：CI 範圍。紅線：真值。

方法比較

三、CI 方法選擇

場景	推薦	原因
大樣本均值 / 迴歸係數	Wald	快速、足夠準	Large-sample mean / regression β	Wald	fast and accurate
比例接近 0 或 1	Wilson / Clopper-Pearson	Wald 嚴重低估邊界區間	Proportion near 0 or 1	Wilson / Clopper-Pearson	Wald collapses at boundaries
GLM / 非線性參數	Profile likelihood	不對稱形狀比 Wald 對	GLM / nonlinear param	Profile likelihood	captures asymmetry
複雜統計量 (中位數、GSEA、網路度)	BCa bootstrap	無需解析式	Complex statistics (median, GSEA, network metric)	BCa bootstrap	no closed-form needed
時間序列 / 相關樣本	Block bootstrap	保留依賴結構	Time series / correlated	Block bootstrap	preserves dependence
下一筆觀測值的不確定性	Prediction interval	CI 只描述參數，不含 ε	Uncertainty of next obs	Prediction interval	CI omits ε

🚫

常見錯誤：「p < 0.05，所以 95% CI 不含 0。」邏輯倒置。正確方向是「CI 不含 H₀ 值 → p 在該 α 下 < α」。CI 是更豐富的資訊，效果量+精度一次提供，論文應同時報 CI。 Common mistake: "p < 0.05, so the 95% CI excludes 0." Logic backwards. Correct: "the CI excludes the H₀ value → p < α at that level". A CI is the richer output: effect size + precision in one. Always report CIs alongside p-values.

程式碼

實作：CI 與 bootstrap

# --- R --- CI 三種寫法 + bootstrap
# 1) Wald CI (lm / glm)
fit <- lm(y ~ x, data=d)
confint(fit)                              # lm: Wald (基於 t-分布)

library(MASS); g <- glm(y ~ x, family=poisson, data=d)
confint(g)                                # glm: profile-likelihood (預設)
confint.default(g)                        # glm: Wald (顯式)

# 2) Proportion CI
library(binom)
binom.confint(3, 20, methods=c("asymptotic", "wilson", "exact"))

# 3) Bootstrap CI
library(boot)
boot_fn <- function(data, idx) median(data[idx])
b <- boot(x, boot_fn, R=2000)
boot.ci(b, type=c("perc", "basic", "bca"))

# 4) 覆蓋率驗證 (一次性檢查 CI 是否真正 95%)
covered <- replicate(2000, {
  x <- rnorm(20, mean=10, sd=3)
  ci <- t.test(x)$conf.int
  ci[1] < 10 && ci[2] > 10
})
mean(covered)                              # 應接近 0.95

# --- Python --- CI 三種寫法 + bootstrap
import numpy as np
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.proportion import proportion_confint

# 1) Wald CI (OLS / GLM)
ols = sm.OLS(y, sm.add_constant(X)).fit()
ols.conf_int(alpha=0.05)                     # Wald-t

# 2) Proportion CI
proportion_confint(3, 20, method="wilson")
proportion_confint(3, 20, method="beta")    # Clopper-Pearson

# 3) Bootstrap CI
res = stats.bootstrap((x,), np.median, n_resamples=2000,
                       method="BCa", confidence_level=0.95)
res.confidence_interval

# 4) 覆蓋率驗證
covered = []
for _ in range(2000):
    x = np.random.normal(10, 3, 20)
    lo, hi = stats.t.interval(0.95, len(x)-1, np.mean(x), stats.sem(x))
    covered.append(lo < 10 < hi)
np.mean(covered)                            # ≈ 0.95

📝 自我檢測

1. 對 95% CI [3.1, 5.2] 的正確頻率派詮釋是？

1. The correct frequentist reading of a 95% CI [3.1, 5.2] is?

A. 真值落在 [3.1, 5.2] 的機率是 0.95A. The true value lies in [3.1, 5.2] with probability 0.95

B. 95% 的觀測值落在 [3.1, 5.2]B. 95% of observations fall in [3.1, 5.2]

C. 如果重複實驗無限次，依此方法構造的區間中 95% 會包含真值C. If we repeated the experiment, 95% of intervals constructed this way would contain the truth

D. 95% 的研究人員會同意這個區間D. 95% of researchers would agree with this interval

2. BCa 相對於 percentile bootstrap 主要修正什麼？

2. BCa improves over percentile bootstrap mainly by correcting what?

A. 計算速度A. Computation speed

B. 估計量的偏誤與抽樣分布的不對稱性B. Bias of the estimator and asymmetry of the sampling distribution

C. 樣本獨立性假設C. The independence assumption

D. 把抽樣方式換成放回 → 不放回D. Switching from with-replacement to without-replacement

3. 為什麼 naive bootstrap 對「樣本最大值 max(X)」會失效？

3. Why does naive bootstrap fail for the sample max?

A. max 不可微A. max is non-differentiable

B. 因為 max 的 SE 太小B. Because the SE of max is too small

C. Bootstrap 樣本中原始 max 的出現機率 ≈ 1−(1−1/n)ⁿ ≈ 0.632，分布非連續、漸近不收斂到極值分布C. The original max appears in each bootstrap sample with prob ≈ 0.632, making the distribution non-continuous and not converging to the right extreme-value law

D. Bootstrap 是貝氏方法D. Bootstrap is a Bayesian method