一、SVG 的數學直覺
給定基因 g 在每個 spot 的表達 ei 與座標 (xi, yi),「SVG」測試的是:e 跟 (x, y) 之間是否存在比隨機更顯著的空間自相關 (spatial autocorrelation)。
最常見的指標:
- Moran's I:經典空間統計,把每個 spot 跟鄰居比較加權平均;快、可解釋。
- Geary's C:類似 Moran's I 但對「相鄰差異」更敏感。
- Gaussian Process 系列(SpatialDE、nnSVG):把 expression 視為 GP,估計 lengthscale 與 variance。
- SPARK / SPARK-X:用 multiple kernels 做 mixed-effects 檢定,scalable。
Given gene g with expression ei at coordinate (xi, yi) per spot, the SVG test asks: is there significant spatial autocorrelation between e and (x, y) beyond chance?
Most common scores:
- Moran's I: classic spatial statistic; weighted average of differences with neighbors. Fast, interpretable.
- Geary's C: similar to Moran's I but more sensitive to local differences.
- Gaussian Process family (SpatialDE, nnSVG): expression as GP — estimate lengthscale and variance.
- SPARK / SPARK-X: mixed-effects tests with multiple kernels; scalable.
二、五大主流方法
| 方法 | 原理 | 2025 benchmark | 速度 | ||
|---|---|---|---|---|---|
| SPARK-X | multi-kernel non-parametric | 平均相關 0.88,整體第一 | ★★★★★ | Multi-kernel non-parametric | Avg correlation 0.88, top method |
| SpatialDE2 | Gaussian Process | 平均相關 0.81 | ★★ | Gaussian Process | Avg correlation 0.81 |
| nnSVG | Nearest-neighbor GP,可 scale 到大資料 | 平均相關 0.80;spatial-aware | ★★★ | Nearest-neighbor GP, scalable | Avg correlation 0.80; spatially-aware |
| Moran's I | 傳統空間自相關 | 平均相關 0.76,強大基線 | ★★★★★ | Classical spatial autocorrelation | Avg correlation 0.76, strong baseline |
| SpatialDE | 原版 GP | 記憶體吃緊;早期工作的代表 | ★ | Original GP | Memory hungry; historical reference |
互動:什麼樣的 pattern 會被判定為 SVG?
切換 4 種模擬模式,左側顯示空間表達,右上顯示 Moran's I(越大越「空間」)。觀察:均勻雜訊 → I≈0;連續梯度 → I 大;隨機高斑點 → I 中等。
Try four simulated patterns. Left: spatial expression. Top-right: Moran's I (larger = more spatial). Uniform noise → I≈0; smooth gradient → high I; random hotspots → moderate I.
實作
# nnSVG (Bioconductor) library(nnSVG) spe <- nnSVG(spe, assay_name = "logcounts") top <- rowData(spe)$gene_name[order(rowData(spe)$padj)][1:20] # Seurat Moran vis <- FindSpatiallyVariableFeatures(vis, assay = "SCT", features = VariableFeatures(vis), selection.method = "moransi") top <- SpatiallyVariableFeatures(vis)[1:20] SpatialFeaturePlot(vis, features = top[1:6])
# Squidpy Moran's I sq.gr.spatial_neighbors(adata, coord_type="generic", delaunay=True) sq.gr.spatial_autocorr(adata, mode="moran", n_perms=100, n_jobs=4) top = adata.uns["moranI"].head(20).index.tolist() sq.pl.spatial_scatter(adata, color=top[:6]) # SPARK-X (R via rpy2 or直接用 R);Python 替代:SpatialDE2 import SpatialDE res = SpatialDE.test(adata, layer="logcounts") sig = res[res.padj < 0.05].sort_values("FSV", ascending=False)
📝 自我檢測
1. 「Highly Variable Gene」一定是 SVG 嗎?
1. Are HVGs always SVGs?
2. 2025 benchmark 中速度與準確度都領先的是?
2. Best speed–accuracy trade-off in the 2025 benchmark?
3. Moran's I 的核心概念是?
3. Core idea of Moran's I?