後驗到底要怎麼讀?
後驗分布本身包含所有關於 θ 的事後資訊。實務上我們需要把它壓縮成幾個摘要:點估計、區間估計、方向機率、模型比較。每一種摘要回答不同的問題,混用會造成解讀錯誤。
本章關鍵:(1) 可信區間 (credible interval, CrI) 是真正的「機率落在這個範圍」陳述,與頻率派 CI 不同;(2) 對偏態後驗,HDI 比 ETI 更合適;(3) 方向機率 P(θ > 0 | y) 是貝氏取代 p-value 的自然摘要;(4) 貝氏因子提供模型比較,但對先驗極度敏感;(5) 後驗預測檢查 (PPC) 與 LOO-CV 是真正抓得到「模型錯了」的工具。
The posterior contains all the information about θ after seeing data. In practice we compress it into summaries: point estimates, intervals, direction probabilities, model comparisons. Each answers a different question, and mixing them up leads to misinterpretation.
Key ideas this chapter: (1) the credible interval (CrI) is a genuine "probability θ lies here" statement — the interpretation people wrongly attach to frequentist CIs; (2) for skewed posteriors, HDI is preferable to ETI; (3) the direction probability P(θ > 0 | y) is the natural Bayesian replacement for a p-value; (4) Bayes factors compare models but are exquisitely sensitive to priors; (5) posterior predictive checks (PPC) and LOO-CV are the tools that actually catch a wrong model.
一、四種後驗摘要
ETI vs HDI 區間
- ETI (Equal-Tailed Interval):取 2.5% 與 97.5% 百分位數,左右尾各切 2.5%。對稱、易計算。
- HDI (Highest Density Interval):取「最窄」的 95% 區間——保證區間內每點密度 ≥ 區間外。
- 對稱後驗 → 兩者相同;偏態 / 截斷後驗 → HDI 更貼近高密度區,更可解釋。
- ETI: cut 2.5% off each tail. Symmetric, trivial to compute.
- HDI: the narrowest 95% interval — every point inside has density ≥ every point outside.
- For symmetric posteriors, ETI = HDI; for skewed / bounded posteriors, HDI tracks the high-density mass.
方向機率與 ROPE
- P(θ > 0 | y):直接陳述「效應為正的後驗機率」——比 p-value 更易解讀。
- ROPE (Region of Practical Equivalence):先定義「實務上等於零」的小區間 (例如 |β| < 0.1),再算
P(θ ∈ ROPE | y)。 - 實作:
bayestestR::p_direction, rope;ArviZ 沒有官方 ROPE,自己用 idata 算即可。
- P(θ > 0 | y): "posterior probability the effect is positive" — far easier to read than a p-value.
- ROPE: define a "practically zero" range (e.g. |β| < 0.1), then compute
P(θ ∈ ROPE | y). - Tools:
bayestestR::p_direction, rope; ArviZ has no built-in ROPE, but compute it on draws directly.
貝氏因子
BF₁₀ = p(y|M₁) / p(y|M₀):兩個模型邊際概似的比。- Jeffreys: BF > 3 弱證據;> 10 強;> 100 極強。
- 對先驗極端敏感 (Lindley's paradox)——拉寬先驗會直接懲罰較複雜模型,使 BF 任意偏向 H₀。
- 邊際概似積分難算,常用 bridge sampling、Savage-Dickey ratio。
BF₁₀ = p(y|M₁) / p(y|M₀)— ratio of marginal likelihoods.- Jeffreys scale: BF > 3 weak; > 10 strong; > 100 decisive.
- Wildly sensitive to priors (Lindley's paradox): widening the prior penalizes the richer model and pushes BF toward H₀.
- Marginal likelihoods are hard integrals — bridge sampling, Savage-Dickey ratio.
PPC 與 LOO-CV
- PPC:從後驗預測抽
y_rep,與觀測y比較分布特徵——抓 model misfit 的最強工具。 - 選測試統計量要對應問題:count 模型用「零比例」、「最大值」、「Var/Mean」。
- LOO-CV (PSIS):以 Pareto-smoothed importance sampling 近似留一交叉驗證的 elpd。
k_hat > 0.7表示某觀察過度影響估計——用reloo重抽或調整。
- PPC: draw
y_repfrom the posterior predictive and compare to observedy— the single best tool for detecting misfit. - Pick test statistics aligned with the question: zero-fraction, max, Var/Mean for counts.
- LOO-CV (PSIS): Pareto-smoothed importance sampling approximates leave-one-out elpd.
k_hat > 0.7flags an observation that the importance-sampling approximation can't handle — usereloo.
PPC 模擬器:常態 vs NB
觀察資料是過度離散的計數 (NB 抽樣)。切換「擬合模型」:誤設的常態模型會生出對稱、可為負的 y_rep,與觀察直方圖明顯不一致;NB 模型則覆蓋得當。這正是 PPC 在實務上能立刻發現問題的場景。
The observed data are overdispersed counts (drawn from NB). Toggle the fitted model: a misspecified Normal generates symmetric, possibly negative y_rep that clearly clash with the observed histogram, whereas NB covers it well. This is the situation where PPC instantly flags a problem.
灰柱:觀察 y;彩色線:8 條 y_rep
二、要報哪個摘要與比較?
🌳 後驗摘要決策樹
P(θ > 0 | y) 與後驗中位數。P(θ ∈ ROPE)。P(θ > 0 | y) + posterior median.P(θ ∈ ROPE).| 問題 | 摘要 / 工具 | 注意 | |||
|---|---|---|---|---|---|
| 效應方向 | P(θ>0|y), 中位數 | 勿與 p-value 混淆 | Direction of effect | P(θ>0|y), median | do not conflate with p-value |
| 效應幅度區間 | 95% HDI / ETI | 偏態後驗用 HDI | Effect-size range | 95% HDI / ETI | HDI for skewed posteriors |
| 模型比較 (預測) | LOO / WAIC + SE | k_hat > 0.7 處需 reloo | Model comparison (predictive) | LOO / WAIC + SE | refit with reloo when k_hat > 0.7 |
| 模型比較 (假說) | Bayes factor (bridge / SD) | 先驗敏感、Lindley paradox | Model comparison (hypothesis) | Bayes factor (bridge / SD) | prior-sensitive, Lindley's paradox |
| 擬合好壞 | PPC + 對應 test stat | test stat 要切題 | Goodness of fit | PPC + targeted test stat | choose test stats aligned with the question |
實作:HDI、PPC、LOO、Bayes Factor
# --- R --- brms / tidybayes / loo / bayestestR library(brms); library(tidybayes); library(bayesplot) library(loo); library(bayestestR); library(bridgesampling) # 1) 後驗摘要:median + HDI posterior_summary(fit) # 預設報 ETI fit |> spread_draws(b_x) |> median_hdci(b_x, .width=.95) # 2) 方向機率與 ROPE p_direction(fit) rope(fit, range=c(-0.1, 0.1)) # 3) PPC:密度疊圖 + 測試統計量 y <- df$y yrep <- posterior_predict(fit, draws=100) ppc_dens_overlay(y, yrep[1:50, ]) ppc_stat(y, yrep, stat = function(x) mean(x == 0)) # 零比例 ppc_stat(y, yrep, stat = "max") # 4) LOO 與模型比較 loo1 <- loo(fit_normal) loo2 <- loo(fit_nb) loo_compare(loo1, loo2) # 差距 / SE pareto_k_table(loo2) # 5) Bayes factor (bridge sampling) bf_obj <- bayesfactor_models(fit1, fit2, denominator=2)
# --- Python --- PyMC + ArviZ import pymc as pm, arviz as az, numpy as np with model: idata = pm.sample() pm.sample_posterior_predictive(idata, extend_inferencedata=True) # 1) HDI / 摘要 az.summary(idata, hdi_prob=0.95) hdi = az.hdi(idata, hdi_prob=0.95) # 2) 方向機率與 ROPE(手動) draws = idata.posterior["beta"].values.flatten() p_dir = (draws > 0).mean() p_rope = ((draws > -0.1) & (draws < 0.1)).mean() # 3) PPC az.plot_ppc(idata, num_pp_samples=50) # 自訂 test stat y = idata.observed_data["y"].values y_rep = idata.posterior_predictive["y"].values.reshape(-1, y.size) zero_obs = (y == 0).mean() zero_rep = (y_rep == 0).mean(axis=1) p_bayes_p = (zero_rep >= zero_obs).mean() # bayesian p-value # 4) LOO 與比較 loo_nb = az.loo(idata_nb) loo_normal = az.loo(idata_normal) az.compare({"NB": idata_nb, "Normal": idata_normal}, ic="loo") # 5) Bayes factor (Savage-Dickey ratio at θ=0) # prior_density_at_0 / posterior_density_at_0
📝 自我檢測
1. 95% 可信區間 (CrI) 的正確解讀是?
1. What is the correct interpretation of a 95% credible interval?
2. 為何 Bayes factor 對先驗高度敏感?這是特性還是缺點?
2. Why is the Bayes factor so sensitive to the prior — is that a feature or a bug?
3. PPC 能抓到、但 R-hat 無法察覺的問題是?
3. What does a posterior predictive check catch that R-hat does not?