α、β、n、效應量的四角關係
檢定力分析(power analysis)是統計設計的第一步,不是事後補的程序。它把四個量綁在一張紙上:顯著水準 α(第一型錯誤率)、第二型錯誤率 β(漏掉真效應的機率,power = 1 − β)、樣本數 n、效應量 effect size(你想偵測的最小差異)。這四個量像四面牆——固定其中三面,第四面自動站直。
研究者最常問「n 要多少?」其實是把 α=0.05、power=0.80、最小有意義效應量這三面牆豎好之後,把 n 解出來。如果 effect size 不講,那不是統計問題,是臨床/生物學問題——你必須先決定「多大的差別才值得偵測」。這就是 SESOI(Smallest Effect Size of Interest)的概念(Lakens 2018; Giner-Sorolla 2024)。
為什麼 post-hoc power(事後 power)是統計禁忌?Hoenig & Heisey (2001 The American Statistician) 證明:用觀察到的 effect size 反算 power,得到的「觀察 power」(observed power)和 p-value 是一對一的單調函數——p < 0.05 一定對應 observed power > 50%,p = 0.5 一定對應 power = 50%。它沒有提供 p-value 以外的任何資訊,卻常被審稿人要求補充。這份教學會教你怎麼禮貌而堅定地拒絕。
Power analysis is the first step of statistical design — not a salvage operation. It pins four quantities to a single sheet of paper: significance level α (type I error rate), type II error rate β (missing a real effect, where power = 1 − β), sample size n, and the effect size you want to detect. The four are walls of a box — fix three and the fourth stands up by itself.
Researchers usually ask "what n do I need?" — that is just solving for n after fixing α = 0.05, power = 0.80, and the minimum meaningful effect. If you cannot state the effect size, that is not a statistics question — it is a clinical or biological one: you have to decide what difference is worth detecting. This is the Smallest Effect Size of Interest (SESOI) idea (Lakens 2018; Giner-Sorolla 2024).
Why is post-hoc power taboo? Hoenig & Heisey (2001 The American Statistician) proved that "observed power" computed from the observed effect is a one-to-one monotone function of the p-value — p < 0.05 always corresponds to observed power > 50%, p = 0.5 always gives power = 50%. It carries zero information beyond the p-value, yet reviewers still request it. This chapter teaches you how to refuse politely and firmly.
一、四個量的鎖鏈關係
把樣本的均值分布想成兩座山——H₀ 山位於 μ₀,H₁ 山位於 μ₀ + δ。我們在 H₀ 山的右側設立 critical value(α 的閾值);α 是 H₀ 山右側超過閾值的尾巴面積(誤殺率),β 是 H₁ 山左側落在閾值以下的尾巴面積(漏抓率)。Power = 1 − β = H₁ 山右側超過閾值的面積。
當 n 增大 → 兩座山變瘦 → 重疊變少 → β 變小 → power 變大。當 effect size 增大 → 兩座山被推開 → 重疊變少 → power 變大。當 α 放寬 → 閾值左移 → β 變小但代價是更多誤殺。這就是四角的物理直覺。
Picture the sampling distribution of the mean as two hills — the H₀ hill at μ₀ and the H₁ hill at μ₀ + δ. The critical value sits on the right of H₀ at the α threshold. α is the right tail of H₀ above the threshold (false alarms); β is the left tail of H₁ below the threshold (missed detections). Power = 1 − β = the right tail of H₁ above the threshold.
Increase n → both hills narrow → less overlap → β shrinks → power rises. Increase the effect → the hills move apart → less overlap → power rises. Loosen α → threshold moves left → β shrinks but at the cost of more false alarms. That is the physical intuition behind the four-way lock.
α — Type I error
誤殺 H₀ 的機率。生物醫學慣用 α = 0.05(單次檢定)。多重檢定要分配 α(見 Step 12 FDR)。GWAS 用 α = 5×10⁻⁸(Bonferroni at ~10⁶ SNPs;Pe'er 2008 Genet Epidemiol)。
Probability of a false alarm. Biomedical convention is α = 0.05 per test. Multiple testing splits α (see Step 12, FDR). GWAS uses α = 5×10⁻⁸ (Bonferroni over ~10⁶ SNPs; Pe'er 2008 Genet Epidemiol).
β — Type II error
漏抓真效應的機率。Power = 1 − β 是「偵測力」。學界慣用 power = 0.80(β = 0.20,1/5 機率漏抓)。臨床三期 RCT 常要求 0.90。Cohen (1988) 提出 α : β = 1 : 4 的成本比例為慣例起源。
Probability of missing a real effect. Power = 1 − β. The standard target is 0.80 (β = 0.20, 1-in-5 miss rate); phase-III RCTs often demand 0.90. Cohen (1988) suggested a 1:4 cost ratio between α and β, the origin of the convention.
n — sample size
每組 n(two-sample)或總 n(regression)。近似公式:兩樣本 t 檢定,每組 n ≈ 2·((z1−α/2 + z1−β)/d)²。當 d = 0.5、α = 0.05、power = 0.80 → 每組 n ≈ 64。Cohen 表 2.4.1 給出大量場景的 n 對照表。
n per group (two-sample) or total n (regression). Approx formula: two-sample t, n per group ≈ 2·((z1−α/2 + z1−β)/d)². For d = 0.5, α = 0.05, power = 0.80 → n ≈ 64 per group. Cohen's Table 2.4.1 lists n for many scenarios.
Effect size
偵測對象的「真實差異」標準化後的尺度。連續資料用 Cohen's d(小 0.2 / 中 0.5 / 大 0.8;Cohen 1988)、ANOVA 用 η² 或 f、相關用 r、二分類用 Cohen's h 或 OR、存活用 HR。絕不要用「pilot 觀察到的 effect」算正式 n——pilot 是 noisy 的,會嚴重低估真實 n(Kraemer 2006 Arch Gen Psychiatry)。
The standardised true difference you intend to detect. Continuous → Cohen's d (small 0.2 / medium 0.5 / large 0.8; Cohen 1988); ANOVA → η² or f; correlation → r; binary → Cohen's h or OR; survival → HR. Never plug a pilot's observed effect into the formal n calculation — pilots are noisy and systematically underestimate the required n (Kraemer 2006 Arch Gen Psychiatry).
Power 計算器
選擇要求解的量(α、n、effect、power),剩下三個用滑桿給定。下方曲線顯示 power 隨 n 的成長(在當前 d 與 α 下)。注意 power 不是線性——當 n 從 10 → 30 進步神速,從 100 → 300 邊際遞減。這正是「n=50–100 是甜點」的數學原因。
Pick which quantity to solve for (α, n, effect, power); set the other three with sliders. The curve below traces power vs n at the current d and α. Notice power is not linear — n = 10 → 30 buys a lot; n = 100 → 300 has steeply diminishing returns. That is the math behind the rule of thumb "n = 50–100 is the sweet spot".
藍線 = power 隨 n 變化(當前 d, α)· 紅虛線 = 0.80 慣例 · 綠點 = 目前滑桿位置Blue = power vs n at current d, α · Red dashed = 0.80 convention · Green dot = current slider position
二、效應量家族與工具
效應量家族
連續資料:Cohen's d = (μ₁−μ₂)/σ,配對 t 用 dz;Hedges' g = d 的小樣本校正(n < 20 推薦;Hedges 1981)。
ANOVA / 迴歸:η²(解釋變異比例)、Cohen's f = √(η²/(1−η²))、f²(多元迴歸;小 0.02 / 中 0.15 / 大 0.35)。
相關:r(Pearson;小 0.1 / 中 0.3 / 大 0.5)。
二分類:Cohen's h = 2(arcsin√p₁ − arcsin√p₂);OR(小 1.5 / 中 2 / 大 3);RR 也可。
存活:HR(Hazard Ratio)+ 預期事件數。
Continuous: Cohen's d = (μ₁−μ₂)/σ; paired uses dz; Hedges' g = small-n corrected d (recommended for n < 20; Hedges 1981).
ANOVA / regression: η², Cohen's f = √(η²/(1−η²)), f² for multiple regression (small 0.02 / medium 0.15 / large 0.35).
Correlation: r (Pearson; 0.1 / 0.3 / 0.5).
Binary: Cohen's h = 2(arcsin√p₁ − arcsin√p₂); OR (1.5 / 2 / 3); RR also fine.
Survival: HR + expected events.
工具地圖
GUI:G*Power 3(Faul 2007 Behav Res Methods;Faul 2009 BRM 41:1149;引用 80,000+ 次;跨 macOS/Win/Linux 免費)。視覺化中央分布與非中央分布的重疊。
R:pwr(Champely)涵蓋 t / ANOVA / r / χ² / 比例 / GLM f²;longpower(Donohue)做 longitudinal LMM;gsDesign(Anderson)做臨床試驗群體設計;survival::nSurv、TrialSize。
Python:statsmodels.stats.power(TTestIndPower、FTestAnovaPower、NormalIndPower、GofChisquarePower);scipy.stats 無內建 power。
商用:PASS(NCSS,涵蓋 1000+ 場景,FDA 認可)、nQuery、Stata power 指令。
GUI: G*Power 3 (Faul 2007 Behav Res Methods; Faul 2009 BRM 41:1149; 80,000+ citations; free on macOS/Win/Linux). Visualises central + noncentral distributions.
R: pwr (Champely) covers t / ANOVA / r / χ² / proportions / GLM f²; longpower (Donohue) for longitudinal LMM; gsDesign (Anderson) for group-sequential trials; survival::nSurv, TrialSize.
Python: statsmodels.stats.power (TTestIndPower, FTestAnovaPower, NormalIndPower, GofChisquarePower); scipy.stats has no built-in power.
Commercial: PASS (NCSS, 1000+ scenarios, FDA-trusted), nQuery, Stata power.
特殊設計
Cluster RCT
當隨機化單位是「群(學校、診所、村莊)」而非個人時,群內個體不獨立。設計效應 DE = 1 + (m − 1)·ICC,m = 平均群大小,ICC = 群內相關係數(intraclass correlation)。有效 n = ntotal / DE。例:m = 30、ICC = 0.05 → DE = 2.45 → 算出的個人 n 要 × 2.45。Hayes & Moulton 2017 "Cluster Randomised Trials" 2nd ed 是聖經。
When the unit of randomisation is a cluster (school, clinic, village), members within a cluster are not independent. Design effect DE = 1 + (m − 1)·ICC, m = average cluster size, ICC = intraclass correlation. Effective n = ntotal / DE. Example: m = 30, ICC = 0.05 → DE = 2.45 → multiply individual-level n by 2.45. Hayes & Moulton 2017 "Cluster Randomised Trials" 2nd ed is the bible.
存活分析
存活的 power 取決於事件數,不是樣本數(Schoenfeld 1981 Biometrics)。先算所需事件 d = ((z1−α/2+z1−β)·(1+k))² / (k·(ln HR)²);再用預期事件率反推 n 與追蹤期。R: survival::nSurv()、gsDesign::nSurv;Python: lifelines 沒有直接 power 函式,但可手算。典型錯誤:「我收了 200 人,每組 100」但事件只發生 30 次——power 由 30 決定,不是 200。
Survival power is driven by event count, not sample size (Schoenfeld 1981 Biometrics). First compute required events d = ((z1−α/2+z1−β)·(1+k))² / (k·(ln HR)²); then back out n and follow-up using the expected event rate. R: survival::nSurv(), gsDesign::nSurv. Python's lifelines has no built-in power function — compute by hand. Classic mistake: "I recruited 200 (100 per arm)" but only 30 events accrue — power is driven by 30, not 200.
不等配對
2:1 allocation 比 1:1 損失 power——以 2:1 為例,效率只有 88%(n_total 要多 12%)。為何還用?實務上:(1) 倫理(讓更多人接受新療法);(2) 預算(控制組已知資料便宜);(3) 不平衡組變異(heteroscedastic)。算式:pwr.t2n.test(n1, n2, d, sig.level)。
2:1 allocation loses power vs 1:1 — at 2:1 you keep only 88% efficiency (need 12% more total n). Why use it? (1) ethics (more patients on the new arm); (2) cost (existing controls are cheap); (3) unequal variance. Use pwr.t2n.test(n1, n2, d, sig.level).
多 endpoint α
RCT 預設 ≥ 2 個 primary endpoint:α 要分割(Bonferroni: α/k)或 hierarchical testing(Maurer 2011)。Power 算每個 endpoint 用分割後的 α。FDA / EMA 對 confirmatory trial 要求嚴格事先 SAP(statistical analysis plan)。
RCTs with ≥ 2 primary endpoints must split α (Bonferroni α/k) or use hierarchical / gatekeeping testing (Maurer 2011). Power each endpoint at the split α. FDA / EMA require a pre-specified SAP for confirmatory trials.
Post-hoc power 假象
拖動觀察 p-value 滑桿。下方顯示用「觀察到的 effect size」反算的 observed power。畫面上的曲線就是 Hoenig & Heisey (2001) 的關鍵發現:observed power 完全由 p-value 決定——p < 0.05 必然有 observed power > 50%,p = 0.5 必然有 observed power = 50%。它沒有提供任何新資訊。當審稿人要求「補 post-hoc power」,正確回應是「請改要 confidence interval 或 sensitivity analysis」。
Slide the observed p-value. The "observed power" computed from the observed effect is plotted. The curve below is exactly Hoenig & Heisey's (2001) key finding: observed power is completely determined by the p-value — p < 0.05 always implies observed power > 50%, p = 0.5 always gives 50%. It carries no new information. When a reviewer demands "post-hoc power", the right answer is: "let me give you a confidence interval or a sensitivity analysis instead".
紫線 = observed power 隨 p 變化 · 紅點 = p=0.05 對應 power≈50%(Hoenig 2001)Purple = observed power vs p · Red dot = p=0.05 ↔ power ≈ 50% (Hoenig 2001)
三、四種 Power 分析的決策樹
🌳 Power 分析時機
四、工具比較
| 工具 | 平台 | 涵蓋面 | 優點 | 限制 | |||
|---|---|---|---|---|---|---|---|
| G*Power 3 (Faul 2007)(Faul 2007) | Win / macOS / Linux GUI | t / F / χ² / z / 相關,多種 ANOVA / GLM | t / F / χ² / z / correlation, many ANOVA / GLM | 免費、GUI 直觀、有圖示中央 / 非中央分布 | Free, intuitive GUI, plots central / noncentral | 無 survival、無 cluster、不能批次 | No survival, no cluster, no batch |
| R pwr (Champely)(Champely) | R package | t、ANOVA (1-way)、r、χ²、比例、f² | t, 1-way ANOVA, r, χ², proportions, f² | script 可重現、可組合 simulation | Scriptable, reproducible, simulation-friendly | 無 survival(用 longpower、gsDesign) | No survival (use longpower, gsDesign) |
| statsmodels (Python)(Python) | Python | TTestIndPower / FTestAnovaPower / NormalIndPower / GofChisquarePower | TTestIndPower / FTestAnovaPower / NormalIndPower / GofChisquarePower | 融入 ML pipeline,solve_power 可求任一量 | Fits ML pipeline, solve_power back-solves any quantity | 無 survival、文檔較少 | No survival, sparser docs |
| PASS (NCSS)(NCSS) | Win GUI | 1000+ 場景:survival, cluster, equivalence, non-inferiority | 1000+ scenarios: survival, cluster, equivalence, non-inferiority | FDA 認可、報告品質高、附 PDF references | FDA-trusted, publication-ready, cites references in PDF | 商用付費、GUI、不容易自動化 | Paid, GUI-only, hard to script |
| Stata power | Stata | 完整:survival、cluster、equivalence、cross-over | Comprehensive: survival, cluster, equivalence, cross-over | 語法統一、文件詳盡、官方支援 | Unified syntax, thorough docs, official support | 付費;script 與 R/Python 整合差 | Paid; weaker integration with R/Python pipelines |
gsDesign、longpower、nSurv。關鍵:把 power 計算的所有參數寫進 protocol / pre-registration / Methods,這是 ARRIVE 2.0 與 CONSORT 都要求的。
Practical recipe: Academic papers: G*Power for the first pass + R/Python to reproduce. Phase-III RCT: PASS or Stata (FDA-preferred). Survival/cluster: gsDesign, longpower, nSurv. Critical: write all power parameters into the protocol / pre-registration / Methods — required by ARRIVE 2.0 and CONSORT.
五、實作範例
# ============= R: pwr + survival/longpower ============= library(pwr) library(longpower) library(gsDesign) # --- 1. Two-sample t-test: a priori (find n) --- pwr.t.test(d = 0.5, power = 0.80, sig.level = 0.05, type = "two.sample", alternative = "two.sided") # n per group ≈ 64 # --- 2. Sensitivity (find min detectable d given n) --- pwr.t.test(n = 30, power = 0.80, sig.level = 0.05, type = "two.sample") # d ≈ 0.74 (Note: NOT post-hoc; this is sensitivity) # --- 3. ANOVA (1-way, k groups, Cohen's f) --- pwr.anova.test(k = 4, f = 0.25, sig.level = 0.05, power = 0.80) # --- 4. Correlation r --- pwr.r.test(r = 0.3, power = 0.80, sig.level = 0.05) # --- 5. Two-proportion (Cohen's h) --- h <- ES.h(0.40, 0.30) pwr.2p.test(h = h, power = 0.80, sig.level = 0.05) # --- 6. Unequal allocation (2:1) --- pwr.t2n.test(n1 = 60, n2 = 30, d = 0.5, sig.level = 0.05) # --- 7. Survival: events required (Schoenfeld 1981) --- # HR=0.7, two-sided α=0.05, power=0.80, equal allocation gsDesign::nSurv(lambdaC = 0.05, hr = 0.7, eta = 0.02, T = 36, minfup = 12, alpha = 0.025, beta = 0.20) # --- 8. Longitudinal LMM --- longpower::lmmpower(delta = 0.1, t = seq(0, 1.5, 0.25), sig2.i = 55, sig2.s = 24, sig2.e = 10, cor.s.i = 0.5, power = 0.80) # --- 9. Cluster RCT: inflate by Design Effect --- ICC <- 0.05; m <- 30 DE <- 1 + (m - 1) * ICC # = 2.45 n_indiv <- ceiling(pwr.t.test(d=0.3, power=0.80)$n) n_cluster_total <- ceiling(n_indiv * DE) cat("Individuals per arm:", n_cluster_total, "\n")
import numpy as np from statsmodels.stats.power import ( TTestIndPower, FTestAnovaPower, NormalIndPower, GofChisquarePower ) from statsmodels.stats.proportion import proportion_effectsize from scipy.stats import norm # --- 1. Two-sample t: a priori (find n) --- ap = TTestIndPower() n = ap.solve_power(effect_size=0.5, alpha=0.05, power=0.80, alternative='two-sided') print(f"n per group ≈ {n:.1f}") # ≈ 63.8 # --- 2. Sensitivity (find min detectable d at n=30) --- d_min = ap.solve_power(nobs1=30, alpha=0.05, power=0.80) print(f"Min detectable d ≈ {d_min:.3f}") # --- 3. ANOVA (1-way) --- fa = FTestAnovaPower() n_total = fa.solve_power(effect_size=0.25, k_groups=4, alpha=0.05, power=0.80) print(f"Total n ≈ {n_total:.1f}") # --- 4. Two-proportion (using Cohen's h) --- h = proportion_effectsize(0.40, 0.30) ni = NormalIndPower().solve_power(effect_size=h, alpha=0.05, power=0.80) print(f"n per group ≈ {ni:.1f}") # --- 5. Survival (Schoenfeld 1981, by hand) --- def schoenfeld_events(HR, alpha=0.05, power=0.80, k=1): """k = allocation ratio (n_treat/n_ctrl). Returns required events.""" za = norm.ppf(1 - alpha/2) zb = norm.ppf(power) return ((za + zb) ** 2) * (1 + k) ** 2 / (k * (np.log(HR)) ** 2) events = schoenfeld_events(HR=0.7) print(f"Required events: {events:.0f}") # ≈ 247 # --- 6. Cluster RCT design effect --- ICC, m = 0.05, 30 DE = 1 + (m - 1) * ICC n_indiv = ap.solve_power(effect_size=0.3, alpha=0.05, power=0.80) n_cluster = np.ceil(n_indiv * DE) print(f"DE = {DE:.2f}, individuals per arm = {n_cluster:.0f}") # --- 7. NEVER do this: post-hoc power --- # observed_d = (mean1 - mean2) / pooled_sd # NO! # ap.power(effect_size=observed_d, nobs1=n, alpha=0.05) # NO! # Report a 95% CI instead — that IS the right "post-hoc" answer.
六、常見錯誤
❌ Post-hoc power
用觀察 effect 反算 power——它和 p-value 是同一個東西的兩種寫法(Hoenig 2001)。審稿人要求時,回信:「Post-hoc power is a deterministic function of the p-value (Hoenig & Heisey 2001 Am Stat 55:19, DOI:10.1198/000313001300339897). We instead report the 95% CI, which directly conveys the precision of our estimate.」
Computing power from the observed effect — it's a deterministic function of the p-value (Hoenig 2001). When reviewers ask, reply: "Post-hoc power is a deterministic function of the p-value (Hoenig & Heisey 2001 Am Stat 55:19, DOI:10.1198/000313001300339897). We instead report the 95% CI, which directly conveys the precision of our estimate."
❌ Pilot effect 算 n
Pilot 樣本太小(典型 n ≤ 30)→ 觀察 effect 變異大且偏向過大("winner's curse")→ 算出來的 n 嚴重低估。Kraemer 2006 Arch Gen Psychiatry:使用 SESOI、文獻 meta-analysis、或 pilot d − 1 SE 作為 effect input。
Pilot n is small (typically ≤ 30) → observed effect is noisy and upwardly biased ("winner's curse") → formal n is severely underestimated. Kraemer 2006 Arch Gen Psychiatry: use SESOI, a meta-analytic effect, or pilot d − 1 SE as input.
❌ 忽略多重檢定
RCT 預設 5 個 primary endpoint 全部用 α = 0.05 → family-wise error 大約 0.23。算 n 時必須用調整後 α(Bonferroni α/k 或 hierarchical α-spending;見 Step 12)。GWAS、proteomics 也適用。
An RCT with 5 primary endpoints all at α = 0.05 has family-wise error ≈ 0.23. n must be powered at the adjusted α (Bonferroni α/k or hierarchical α-spending; see Step 12). Same for GWAS and proteomics.
❌ 忽略 ICC
群集 RCT 用個人 n 算 power → 嚴重高估。記得乘以 DE = 1 + (m − 1)·ICC。Hayes & Moulton 2017 警告:典型醫療 ICC = 0.01–0.10;忽略 ICC 是 cluster trial 失敗的頭號原因。
Computing cluster-RCT power on individual n → severe overestimate. Multiply by DE = 1 + (m − 1)·ICC. Hayes & Moulton 2017: typical health ICC = 0.01–0.10; ignoring ICC is the top cause of failed cluster trials.
❌ Survival 用 n 不用事件
log-rank 的 power 取決於事件數,不是樣本數。「收 500 人」無關緊要,重點是「事件 ≥ 多少」。Schoenfeld 1981 公式直接算事件 d;再用預期事件率反推 n 與追蹤期。R: gsDesign::nSurv。
log-rank power depends on events, not n. "500 enrolled" is irrelevant — only "events accrued" matters. Schoenfeld's 1981 formula gives events d directly; back out n and follow-up from the expected event rate. R: gsDesign::nSurv.
❌ 效應量太樂觀
研究者最常為了得到「合理」的 n,把 d 從 0.3 改成 0.8——這叫 effect inflation。論文 Methods 必須寫明 effect size 的依據(meta-analysis、SESOI、臨床指引)。Albers & Lakens 2018 PLOS ONE:超過 70% 的 power analyses 用無根據的 effect。
The most common shortcut: inflate d from 0.3 to 0.8 just to make n "feasible" — effect inflation. Methods sections must justify the effect (meta-analysis, SESOI, clinical guideline). Albers & Lakens 2018 PLOS ONE: > 70% of published power analyses use unjustified effects.
七、實戰場景
情境 A:降血壓藥 RCT
目標:偵測新藥 vs 安慰劑 SBP 差 ≥ 5 mmHg。已知 σ ≈ 12 mmHg → d = 5/12 = 0.42。α = 0.025(雙側 0.05 + 雙臂 0.025)、power = 0.90。
R:pwr.t.test(d=0.42, power=0.9, sig.level=0.025) → n ≈ 152/arm。加 15% 失訪 → 175/arm。Pre-specified in protocol, SAP fixed.
Goal: detect drug vs placebo SBP gap ≥ 5 mmHg. Known σ ≈ 12 → d = 5/12 = 0.42. α = 0.025, power = 0.90.
R: pwr.t.test(d=0.42, power=0.9, sig.level=0.025) → n ≈ 152/arm. Add 15% dropout → 175/arm. Pre-specified in protocol, SAP locked.
情境 B:劑量反應實驗
目標:偵測 5 個劑量間 IC50 差異(One-way ANOVA)。文獻 Cohen's f = 0.40(大效應)、α = 0.05、power = 0.80。
R:pwr.anova.test(k=5, f=0.40, power=0.8) → n ≈ 16/group → 總 80。技術重複 ≥ 3 wells/group,明確分層分析(Step 13 mixed model)。
Goal: detect IC50 differences across 5 doses (1-way ANOVA). Literature Cohen's f = 0.40 (large), α = 0.05, power = 0.80.
R: pwr.anova.test(k=5, f=0.40, power=0.8) → n ≈ 16/group → total 80. Technical reps ≥ 3 wells/group; analyse hierarchically (Step 13 mixed model).
情境 C:GWAS
目標:偵測 OR = 1.20 的 SNP(MAF = 0.30)。αGWAS = 5×10⁻⁸(Pe'er 2008)、power = 0.80。
Genetic Power Calculator 或 Quanto:≈ 12,000 cases + 12,000 controls。為何要這麼大?因為 α 嚴格到 5×10⁻⁸ 對應 z1−α/2 ≈ 5.45(vs α=0.05 的 1.96)——光是 (5.45/1.96)² ≈ 7.7 倍的 n。
Goal: detect SNP at OR = 1.20 (MAF = 0.30). αGWAS = 5×10⁻⁸ (Pe'er 2008), power = 0.80.
Genetic Power Calculator or Quanto: ≈ 12,000 cases + 12,000 controls. Why so big? α = 5×10⁻⁸ ↔ z1−α/2 ≈ 5.45 (vs 1.96 at α = 0.05) — that alone is (5.45/1.96)² ≈ 7.7× the n.
📝 自我檢測
1. 審稿人寫信:「請補上 post-hoc power 分析。」最佳回應是?
1. A reviewer asks: "Please add a post-hoc power analysis." Best response?
2. 你規劃一個 cluster RCT(學校為單位),平均每校 m = 25 名學生,預期 ICC = 0.04。個人層級算出每組需 100 人——實際每組要多少?
2. You plan a cluster RCT (schools as clusters), m = 25 students/school, ICC = 0.04. Individual-level power says 100 per arm — what do you actually need per arm?
3. 你被預算限制 n = 40/組。最合適的 power 分析類型是?
3. Your budget caps n at 40/group. Which power-analysis type fits?
4. 存活分析(log-rank)power 主要取決於下列哪一項?
4. Survival (log-rank) power mainly depends on which of the following?