一、scRNA-style 降維 vs 空間感知降維
傳統 ST 流程沿用 scRNA:HVG → PCA → UMAP / Leiden。這是「忽略空間」的做法,PCA 完才把分群結果疊回 spatial map。優點:簡單、可重現、跟 scRNA 工具完全相容。缺點:相鄰 spot 的相似性沒被 model 學進去——只有後處理時才看得出空間結構。
空間感知方法(BANKSY、SpaceFlow embedding、STAGATE encoder)則把每個 spot 的鄰域平均表達量當成額外特徵串入向量,所以降維後的 embedding 同時反映「我表達什麼」與「我周遭的鄰居表達什麼」。後續 cluster 自然會傾向形成空間連續的區塊。
The classic ST pipeline mirrors scRNA: HVG → PCA → UMAP / Leiden. This "ignores space" — clusters are computed first, then layered back on the spatial map. Pros: simple, reproducible, fully compatible with the scRNA toolchain. Cons: spot-spot adjacency isn't modeled — spatial structure is only visible post hoc.
Spatially-aware methods (BANKSY, SpaceFlow encoder, STAGATE) concatenate each spot's neighborhood mean expression as extra features, so the embedding reflects both "what I express" and "what my neighbors express." Downstream clusters then form spatially-contiguous regions naturally.
二、特徵選擇:HVG 還是 SVG?
HVG (Highly Variable Genes)
「整體變異最大」的基因。不考慮空間。常用 mean–variance trend (Seurat vst)、Pearson residuals。
適用:和 scRNA 一致流程、初步探索。
Genes with largest overall variance. Spatial-agnostic. Common: mean–variance trend (Seurat vst), Pearson residuals.
Use: scRNA-aligned workflow, quick exploration.
SVG (Spatially Variable Genes)
表達量跟空間位置相關的基因。HVG 不一定 spatial(如全組織高 noise 的 housekeeping);SVG 不一定 highly variable(弱訊號但有結構也算)。
適用:spatial domain identification、SVG 詳細在 Step 7。
Genes whose expression depends on location. HVGs aren't always spatial (e.g. noisy housekeeping); SVGs aren't always highly variable (weak but structured signals count).
Use: spatial domain ID — covered in Step 7.
互動:BANKSY 鄰域權重如何改變嵌入
BANKSY 的核心參數 λ(lambda)控制「我自己的表達 vs 鄰域平均表達」的權重。λ = 0 等價於忽略空間;λ 越大越偏重空間鄰域。觀察右側 cluster 結構如何從「混在一起」變成「空間連續區塊」。
BANKSY's core parameter λ controls "self vs neighborhood-mean" weight. λ = 0 ignores space; larger λ leans on neighborhood context. Watch how clusters morph from "tangled" into "spatially contiguous patches" on the right.
顏色:cluster 標籤;底色:兩個生物學區塊
實作
# 標準路線 / Standard scRNA-style vis <- FindVariableFeatures(vis, nfeatures = 2000) vis <- ScaleData(vis) vis <- RunPCA(vis, npcs = 30) vis <- RunUMAP(vis, dims = 1:30) DimPlot(vis); SpatialDimPlot(vis) # 空間感知:BANKSY (Seurat v5 已內建) library(Banksy) vis <- RunBanksy(vis, lambda = 0.2, dimx = "x", dimy = "y", assay = "SCT") vis <- RunPCA(vis, assay = "BANKSY", npcs = 30) vis <- FindNeighbors(vis, reduction = "pca") |> FindClusters(resolution = 0.6)
# 標準路線 / Standard sc.pp.highly_variable_genes(adata, n_top_genes=2000) sc.pp.scale(adata, max_value=10) sc.tl.pca(adata, n_comps=30) sc.pp.neighbors(adata, n_pcs=30); sc.tl.umap(adata) # BANKSY (Banksy-py) import banksy_py as bp adata = bp.banksy.compute_banksy_matrix(adata, lambda_=0.2, k_geom=15) sc.tl.pca(adata, layer="banksy", n_comps=30) sc.pp.neighbors(adata, n_pcs=30); sc.tl.leiden(adata, resolution=0.6)
📝 自我檢測
1. 為什麼純 PCA + Leiden 在 ST 常常產生「空間不連續」的 cluster?
1. Why does plain PCA + Leiden often produce "spatially-discontinuous" clusters in ST?
2. BANKSY 的 λ 參數調得很大會發生什麼?
2. What happens if BANKSY's λ is set very high?
3. SVG 跟 HVG 最關鍵的差別?
3. Key difference between SVG and HVG?