Step 7: Spatially Variable Genes — Spatial Transcriptomics Tutorial

概念

一、SVG 的數學直覺

給定基因 g 在每個 spot 的表達 e_i 與座標 (x_i, y_i)，「SVG」測試的是：e 跟 (x, y) 之間是否存在比隨機更顯著的空間自相關 (spatial autocorrelation)。

最常見的指標：

Moran's I：經典空間統計，把每個 spot 跟鄰居比較加權平均；快、可解釋。
Geary's C：類似 Moran's I 但對「相鄰差異」更敏感。
Gaussian Process 系列（SpatialDE、nnSVG）：把 expression 視為 GP，估計 lengthscale 與 variance。
SPARK / SPARK-X：用 multiple kernels 做 mixed-effects 檢定，scalable。

Given gene g with expression e_i at coordinate (x_i, y_i) per spot, the SVG test asks: is there significant spatial autocorrelation between e and (x, y) beyond chance?

Most common scores:

Moran's I: classic spatial statistic; weighted average of differences with neighbors. Fast, interpretable.
Geary's C: similar to Moran's I but more sensitive to local differences.
Gaussian Process family (SpatialDE, nnSVG): expression as GP — estimate lengthscale and variance.
SPARK / SPARK-X: mixed-effects tests with multiple kernels; scalable.

方法比較

二、五大主流方法

方法	原理	2025 benchmark	速度
SPARK-X	multi-kernel non-parametric	平均相關 0.88，整體第一	★★★★★	Multi-kernel non-parametric	Avg correlation 0.88, top method
SpatialDE2	Gaussian Process	平均相關 0.81	★★	Gaussian Process	Avg correlation 0.81
nnSVG	Nearest-neighbor GP，可 scale 到大資料	平均相關 0.80；spatial-aware	★★★	Nearest-neighbor GP, scalable	Avg correlation 0.80; spatially-aware
Moran's I	傳統空間自相關	平均相關 0.76，強大基線	★★★★★	Classical spatial autocorrelation	Avg correlation 0.76, strong baseline
SpatialDE	原版 GP	記憶體吃緊；早期工作的代表	★	Original GP	Memory hungry; historical reference

💡

實務建議：大資料 → SPARK-X；想要解釋直觀 → Moran's I（Squidpy 內建）；要看 lengthscale 結構 → nnSVG。 Practical tip: for large datasets → SPARK-X; for intuitive interpretation → Moran's I (built into Squidpy); to inspect lengthscale → nnSVG.

互動模擬

互動：什麼樣的 pattern 會被判定為 SVG？

切換 4 種模擬模式，左側顯示空間表達，右上顯示 Moran's I（越大越「空間」）。觀察：均勻雜訊 → I≈0；連續梯度 → I 大；隨機高斑點 → I 中等。

Try four simulated patterns. Left: spatial expression. Top-right: Moran's I (larger = more spatial). Uniform noise → I≈0; smooth gradient → high I; random hotspots → moderate I.

Moran's I = —

程式碼

實作

# nnSVG (Bioconductor)
library(nnSVG)
spe <- nnSVG(spe, assay_name = "logcounts")
top <- rowData(spe)$gene_name[order(rowData(spe)$padj)][1:20]

# Seurat Moran
vis <- FindSpatiallyVariableFeatures(vis, assay = "SCT",
        features = VariableFeatures(vis), selection.method = "moransi")
top <- SpatiallyVariableFeatures(vis)[1:20]
SpatialFeaturePlot(vis, features = top[1:6])

# Squidpy Moran's I
sq.gr.spatial_neighbors(adata, coord_type="generic", delaunay=True)
sq.gr.spatial_autocorr(adata, mode="moran", n_perms=100, n_jobs=4)
top = adata.uns["moranI"].head(20).index.tolist()
sq.pl.spatial_scatter(adata, color=top[:6])

# SPARK-X (R via rpy2 or直接用 R)；Python 替代：SpatialDE2
import SpatialDE
res = SpatialDE.test(adata, layer="logcounts")
sig = res[res.padj < 0.05].sort_values("FSV", ascending=False)

📝 自我檢測

1. 「Highly Variable Gene」一定是 SVG 嗎？

1. Are HVGs always SVGs?

A. 是，兩者等同A. Yes, they are equivalent

B. 是，HVG 一定有空間結構B. Yes, HVGs always have spatial structure

C. 不一定，HVG 變異可能是雜訊也可能跨整片組織分散C. Not necessarily — high variance can be noise or distributed without spatial structure

D. 反過來才對 (SVG 一定是 HVG)D. The reverse holds (SVG ⊂ HVG)

2. 2025 benchmark 中速度與準確度都領先的是？

2. Best speed–accuracy trade-off in the 2025 benchmark?

A. SPARK-XA. SPARK-X

B. SpatialDE (原版)B. SpatialDE (original)

C. K-meansC. K-means

D. Linear regressionD. Linear regression

3. Moran's I 的核心概念是？

3. Core idea of Moran's I?

A. 計算每個基因的平均表達A. Computes mean expression

B. 量化「鄰近 spot 的表達是否比隨機更相似」B. Quantifies whether neighbors' expressions are more similar than random

C. 自動估算 cluster 數C. Estimates cluster count

D. 跟空間無關D. Has nothing to do with space