STEP 4 / 15

標準化:把 spot 之間放回同一條起跑線

spot 含多細胞時,「定序深度差異」會跟「真實細胞密度差異」糾纏在一起。

When spots contain mixed cells, "sequencing depth" gets entangled with "true cell density."

一、ST normalization 的特殊難題

scRNA normalization 想校正的只有「細胞間的 library size 差異」。在 ST 上,library size 差異本身可能就是訊號——細胞密集的區域真的就會比基質區有更多 RNA。如果無條件地把每個 spot 標準化到同一個總量,會洗掉真實的密度資訊。

所以 ST 的 normalization 多了兩個考量:

  • 是不是要保留細胞密度差異?取決於下游分析(spatial domains 通常想保留;deconvolution 不一定)。
  • spot 含混合細胞時,SCTransform 的 NB 假設可能失準(混合細胞的 mean-variance 不見得符合 NB)。

scRNA normalization corrects only "between-cell library size." In ST, library-size differences may themselves be signal — dense regions naturally produce more RNA than stromal regions. Forcing every spot to the same total wipes out true density information.

So ST normalization adds two considerations:

  • Do we want to preserve cell-density differences? Depends on downstream task (spatial domains usually yes; deconvolution often not).
  • For mixed-cell spots, SCTransform's NB assumption may break (mixed cells need not follow NB mean–variance).

二、三種主流方法

方法原理ST 優點缺點
LogNormalize每 spot 除以總 UMI × 10 000,再 log1p最簡單、最快、最穩定未處理 mean–variance 偏差Per-spot scale to 10 000, then log1pSimple, fast, robustDoesn't address mean–variance bias
SCTransform負二項回歸,把 library size 當 covariate同時做 norm + HVG,下游 PCA 通常更乾淨spot 含混合細胞時 NB 假設失真;耗時NB regression with library size as covariateJoint norm + HVG; cleaner PCANB assumption breaks for mixed-cell spots; slow
scran size factor先粗分群、再用 deconvolution 估 size factor對 zero-inflation 與異質性穩定需要先分群、實作較複雜Pre-cluster, then deconvolution-based size factorsRobust to zero-inflation/heterogeneityNeeds pre-clustering, more steps
💡
2024 之後 scverse / Bioconductor 共識:spot-based ST 預設用 LogNormalize 即可,下游若 PCA 結構雜亂再考慮 SCTransform;image-based(單細胞解析)回到 scRNA 邏輯,SCTransform 更合適。 Post-2024 scverse / Bioconductor consensus: LogNormalize is a sensible default for spot-based ST; switch to SCTransform if PCA structure looks noisy. For image-based (single-cell) data, SCTransform follows scRNA logic and works well.

互動:normalization 對 mean–variance 的影響

同一份模擬資料,切換不同 normalization。觀察:好的 normalization 會讓基因在整個 expression 範圍都不被 highly-expressed 基因主宰。

Same simulated data under different normalizations. A good normalization keeps the variance distribution flat across the expression range — not dominated by highly expressed genes.

X:log10(mean count);Y:log10(variance)

實作

# 方法 1:LogNormalize
vis <- NormalizeData(vis, normalization.method = "LogNormalize", scale.factor = 10000)

# 方法 2:SCTransform (建議單一切片或單細胞解析資料)
vis <- SCTransform(vis, assay = "Spatial", verbose = FALSE)

# 方法 3:scran size factor (Bioconductor)
library(scran)
qclust <- quickCluster(spe, min.size = 100)
spe <- computeSumFactors(spe, clusters = qclust)
spe <- logNormCounts(spe)
# LogNormalize (Scanpy default)
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# 也可走 scran 風格 size factor (透過 anndata2ri)
import scran_python as sp
sf = sp.compute_size_factors(adata)
adata.X = adata.X / sf[:,None]

# 或 SCTransform 風格 (Pearson residuals)
sc.experimental.pp.normalize_pearson_residuals(adata)

📝 自我檢測

1. 為什麼在 spot-based ST 上「強制每 spot 標準化到相同總量」可能造成資訊流失?

1. Why can "force-scale every spot to the same total" lose information in spot-based ST?

A. 因為 NGS 不準A. Because NGS is inaccurate
B. 因為 log 轉換是有損的B. Because log transform is lossy
C. 因為 spot 的總 UMI 部分反映了真實細胞密度C. Because total UMI per spot partly reflects true cell density
D. 因為 Visium 不允許標準化D. Because Visium doesn't allow normalization

2. 對於混合細胞嚴重的 Visium spot,下列敘述何者較合理?

2. For Visium spots with strong cell mixing, which is more reasonable?

A. SCTransform 仍然完全有效A. SCTransform remains fully appropriate
B. NB 假設可能失真,LogNormalize 通常是穩定起點B. NB assumption may break — LogNormalize is a robust default
C. 不需要任何 normalizationC. No normalization needed
D. 只能用 raw countsD. Only raw counts can be used

3. scran 的 size factor 為什麼比簡單的 library size 穩定?

3. Why are scran size factors more stable than simple library sizes?

A. 它先粗分群再 deconvolution,避免高表達基因主宰A. It pre-clusters and uses deconvolution to avoid bias from top genes
B. 它直接抹掉 library sizeB. It just zeroes out library size
C. 它只能用在 image-based 平台C. It only works on image-based platforms
D. 它跟 scRNA 沒有關係D. It is unrelated to scRNA