STEP 7 / 15

空間變異基因 (SVGs):找出表達跟「位置」有關的基因

不只是 highly variable,而是 expression 與空間座標有顯著關聯。

Not just highly variable — variance that is structured in space.

一、SVG 的數學直覺

給定基因 g 在每個 spot 的表達 ei 與座標 (xi, yi),「SVG」測試的是:e 跟 (x, y) 之間是否存在比隨機更顯著的空間自相關 (spatial autocorrelation)

最常見的指標:

  • Moran's I:經典空間統計,把每個 spot 跟鄰居比較加權平均;快、可解釋。
  • Geary's C:類似 Moran's I 但對「相鄰差異」更敏感。
  • Gaussian Process 系列(SpatialDE、nnSVG):把 expression 視為 GP,估計 lengthscale 與 variance。
  • SPARK / SPARK-X:用 multiple kernels 做 mixed-effects 檢定,scalable。

Given gene g with expression ei at coordinate (xi, yi) per spot, the SVG test asks: is there significant spatial autocorrelation between e and (x, y) beyond chance?

Most common scores:

  • Moran's I: classic spatial statistic; weighted average of differences with neighbors. Fast, interpretable.
  • Geary's C: similar to Moran's I but more sensitive to local differences.
  • Gaussian Process family (SpatialDE, nnSVG): expression as GP — estimate lengthscale and variance.
  • SPARK / SPARK-X: mixed-effects tests with multiple kernels; scalable.

二、五大主流方法

方法原理2025 benchmark速度
SPARK-Xmulti-kernel non-parametric平均相關 0.88,整體第一★★★★★Multi-kernel non-parametricAvg correlation 0.88, top method
SpatialDE2Gaussian Process平均相關 0.81★★Gaussian ProcessAvg correlation 0.81
nnSVGNearest-neighbor GP,可 scale 到大資料平均相關 0.80;spatial-aware★★★Nearest-neighbor GP, scalableAvg correlation 0.80; spatially-aware
Moran's I傳統空間自相關平均相關 0.76,強大基線★★★★★Classical spatial autocorrelationAvg correlation 0.76, strong baseline
SpatialDE原版 GP記憶體吃緊;早期工作的代表Original GPMemory hungry; historical reference
💡
實務建議:大資料 → SPARK-X;想要解釋直觀 → Moran's I(Squidpy 內建);要看 lengthscale 結構 → nnSVG。 Practical tip: for large datasets → SPARK-X; for intuitive interpretation → Moran's I (built into Squidpy); to inspect lengthscale → nnSVG.

互動:什麼樣的 pattern 會被判定為 SVG?

切換 4 種模擬模式,左側顯示空間表達,右上顯示 Moran's I(越大越「空間」)。觀察:均勻雜訊 → I≈0;連續梯度 → I 大;隨機高斑點 → I 中等。

Try four simulated patterns. Left: spatial expression. Top-right: Moran's I (larger = more spatial). Uniform noise → I≈0; smooth gradient → high I; random hotspots → moderate I.

Moran's I = —

實作

# nnSVG (Bioconductor)
library(nnSVG)
spe <- nnSVG(spe, assay_name = "logcounts")
top <- rowData(spe)$gene_name[order(rowData(spe)$padj)][1:20]

# Seurat Moran
vis <- FindSpatiallyVariableFeatures(vis, assay = "SCT",
        features = VariableFeatures(vis), selection.method = "moransi")
top <- SpatiallyVariableFeatures(vis)[1:20]
SpatialFeaturePlot(vis, features = top[1:6])
# Squidpy Moran's I
sq.gr.spatial_neighbors(adata, coord_type="generic", delaunay=True)
sq.gr.spatial_autocorr(adata, mode="moran", n_perms=100, n_jobs=4)
top = adata.uns["moranI"].head(20).index.tolist()
sq.pl.spatial_scatter(adata, color=top[:6])

# SPARK-X (R via rpy2 or直接用 R);Python 替代:SpatialDE2
import SpatialDE
res = SpatialDE.test(adata, layer="logcounts")
sig = res[res.padj < 0.05].sort_values("FSV", ascending=False)

📝 自我檢測

1. 「Highly Variable Gene」一定是 SVG 嗎?

1. Are HVGs always SVGs?

A. 是,兩者等同A. Yes, they are equivalent
B. 是,HVG 一定有空間結構B. Yes, HVGs always have spatial structure
C. 不一定,HVG 變異可能是雜訊也可能跨整片組織分散C. Not necessarily — high variance can be noise or distributed without spatial structure
D. 反過來才對 (SVG 一定是 HVG)D. The reverse holds (SVG ⊂ HVG)

2. 2025 benchmark 中速度與準確度都領先的是?

2. Best speed–accuracy trade-off in the 2025 benchmark?

A. SPARK-XA. SPARK-X
B. SpatialDE (原版)B. SpatialDE (original)
C. K-meansC. K-means
D. Linear regressionD. Linear regression

3. Moran's I 的核心概念是?

3. Core idea of Moran's I?

A. 計算每個基因的平均表達A. Computes mean expression
B. 量化「鄰近 spot 的表達是否比隨機更相似」B. Quantifies whether neighbors' expressions are more similar than random
C. 自動估算 cluster 數C. Estimates cluster count
D. 跟空間無關D. Has nothing to do with space