Step 5: Dimensionality Reduction — Spatial Transcriptomics Tutorial

兩條路

一、scRNA-style 降維 vs 空間感知降維

傳統 ST 流程沿用 scRNA：HVG → PCA → UMAP / Leiden。這是「忽略空間」的做法，PCA 完才把分群結果疊回 spatial map。優點：簡單、可重現、跟 scRNA 工具完全相容。缺點：相鄰 spot 的相似性沒被 model 學進去——只有後處理時才看得出空間結構。

空間感知方法（BANKSY、SpaceFlow embedding、STAGATE encoder）則把每個 spot 的鄰域平均表達量當成額外特徵串入向量，所以降維後的 embedding 同時反映「我表達什麼」與「我周遭的鄰居表達什麼」。後續 cluster 自然會傾向形成空間連續的區塊。

The classic ST pipeline mirrors scRNA: HVG → PCA → UMAP / Leiden. This "ignores space" — clusters are computed first, then layered back on the spatial map. Pros: simple, reproducible, fully compatible with the scRNA toolchain. Cons: spot-spot adjacency isn't modeled — spatial structure is only visible post hoc.

Spatially-aware methods (BANKSY, SpaceFlow encoder, STAGATE) concatenate each spot's neighborhood mean expression as extra features, so the embedding reflects both "what I express" and "what my neighbors express." Downstream clusters then form spatially-contiguous regions naturally.

HVG vs SVG

二、特徵選擇：HVG 還是 SVG？

HVG (Highly Variable Genes)

「整體變異最大」的基因。不考慮空間。常用 mean–variance trend (Seurat vst)、Pearson residuals。

適用：和 scRNA 一致流程、初步探索。

Genes with largest overall variance. Spatial-agnostic. Common: mean–variance trend (Seurat vst), Pearson residuals.

Use: scRNA-aligned workflow, quick exploration.

SVG (Spatially Variable Genes)

表達量跟空間位置相關的基因。HVG 不一定 spatial（如全組織高 noise 的 housekeeping）；SVG 不一定 highly variable（弱訊號但有結構也算）。

適用：spatial domain identification、SVG 詳細在 Step 7。

Genes whose expression depends on location. HVGs aren't always spatial (e.g. noisy housekeeping); SVGs aren't always highly variable (weak but structured signals count).

Use: spatial domain ID — covered in Step 7.

💡

實務：初次分析用 2 000 個 HVG 做 PCA 即可；想針對空間結構 cluster 時，再用 SVG 重做一次 PCA 通常會更乾淨。 In practice: start with ~2 000 HVGs for PCA; when you specifically want spatial-structure clustering, re-run PCA on SVGs — it usually looks cleaner.

互動模擬

互動：BANKSY 鄰域權重如何改變嵌入

BANKSY 的核心參數 λ（lambda）控制「我自己的表達 vs 鄰域平均表達」的權重。λ = 0 等價於忽略空間；λ 越大越偏重空間鄰域。觀察右側 cluster 結構如何從「混在一起」變成「空間連續區塊」。

BANKSY's core parameter λ controls "self vs neighborhood-mean" weight. λ = 0 ignores space; larger λ leans on neighborhood context. Watch how clusters morph from "tangled" into "spatially contiguous patches" on the right.

BANKSY λ 0.0

顏色：cluster 標籤；底色：兩個生物學區塊

程式碼

實作

# 標準路線 / Standard scRNA-style
vis <- FindVariableFeatures(vis, nfeatures = 2000)
vis <- ScaleData(vis)
vis <- RunPCA(vis, npcs = 30)
vis <- RunUMAP(vis, dims = 1:30)
DimPlot(vis); SpatialDimPlot(vis)

# 空間感知：BANKSY (Seurat v5 已內建)
library(Banksy)
vis <- RunBanksy(vis, lambda = 0.2, dimx = "x", dimy = "y", assay = "SCT")
vis <- RunPCA(vis, assay = "BANKSY", npcs = 30)
vis <- FindNeighbors(vis, reduction = "pca") |> FindClusters(resolution = 0.6)

# 標準路線 / Standard
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
sc.pp.scale(adata, max_value=10)
sc.tl.pca(adata, n_comps=30)
sc.pp.neighbors(adata, n_pcs=30); sc.tl.umap(adata)

# BANKSY (Banksy-py)
import banksy_py as bp
adata = bp.banksy.compute_banksy_matrix(adata, lambda_=0.2, k_geom=15)
sc.tl.pca(adata, layer="banksy", n_comps=30)
sc.pp.neighbors(adata, n_pcs=30); sc.tl.leiden(adata, resolution=0.6)

📝 自我檢測

1. 為什麼純 PCA + Leiden 在 ST 常常產生「空間不連續」的 cluster？

1. Why does plain PCA + Leiden often produce "spatially-discontinuous" clusters in ST?

A. 因為 PCA 不適合 STA. PCA isn't suitable for ST

B. 因為它把空間資訊完全忽略，只看 expressionB. It ignores space entirely, considering only expression

C. 因為 ST 沒有空間資訊C. Because ST has no spatial info

D. 因為 BANKSY 是錯的D. Because BANKSY is wrong

2. BANKSY 的 λ 參數調得很大會發生什麼？

2. What happens if BANKSY's λ is set very high?

A. 完全跟著鄰居走，cluster 變得平滑、可能模糊邊界差異A. Embedding follows neighbors heavily, clusters smooth out — boundaries may blur

B. 完全等同於 PCAB. Becomes equivalent to PCA

C. cluster 數一定變多C. Always increases cluster count

D. 不影響任何結果D. Has no effect

3. SVG 跟 HVG 最關鍵的差別？

3. Key difference between SVG and HVG?

A. SVG 只能用 RA. SVG only runs in R

B. SVG 考慮空間自相關，HVG 只看整體變異B. SVG considers spatial autocorrelation; HVG only looks at overall variance

C. SVG 一定比 HVG 多C. SVGs are always more numerous than HVGs

D. SVG 跟 HVG 是同義詞D. SVG and HVG are synonyms