Step 8: Deconvolution — Spatial Transcriptomics Tutorial

問題

一、為什麼需要 deconvolution？

在 spot-based 平台，一個 spot 的表達向量是其涵蓋細胞的表達線性混合。如果直接用 spot 表達分群，會看到「不純」的 cluster——例如腫瘤邊界區同時混了腫瘤細胞 + 巨噬細胞 + 內皮細胞。

Deconvolution 的目標：在已知 scRNA cell-type signature 的前提下，估算每個 spot 上各細胞類型的比例 (proportion / abundance)。輸出通常是一個 spot × cell-type 的矩陣。

On spot-based platforms, a spot's expression is a linear mixture of its underlying cells. Clustering spot expression directly produces "impure" clusters — e.g. a tumor margin mixes tumor + macrophage + endothelial cells.

Deconvolution aims, given an scRNA cell-type signature, to estimate the proportion (or abundance) of each cell type per spot. Output: a spot × cell-type matrix.

主流方法

二、五大主流方法

方法	原理	平台適性	GPU
cell2location	Bayesian probabilistic	Visium / Slide-seq / Stereo-seq 多項 benchmark 第一	✓	Bayesian probabilistic	Often #1 across Visium / Slide-seq / Stereo-seq benchmarks
RCTD	Probabilistic likelihood (NB)	Visium / Slide-seq；速度快、有 CPU 平行	✗	Probabilistic NB likelihood	Visium / Slide-seq; fast, CPU-parallel
CARD	Conditional autoregressive 加入空間平滑	Visium 上整體第一名 (2023 Nat Comm)	✗	Conditional autoregressive + spatial smoothing	#1 on Visium in 2023 Nat Comm benchmark
SPOTlight	Seeded NMF	Visium 表現好、易解釋	✗	Seeded NMF	Strong on Visium, easy to interpret
Tangram	Deep learning, 把 sc 細胞「映射」到 spot	適合 image-based 資料；可同時做映射 + imputation	✓	Deep learning, maps sc cells to spots	Good for image-based data; jointly maps + imputes

💡

建議起點：RCTD（CPU 即可、快、表現穩定）；GPU 充足時上 cell2location（更精細的 hierarchical 模型）。Slide-seqV2 / Stereo-seq → cell2location 或 STdeconvolve。 Suggested starting point: RCTD (CPU-only, fast, robust); upgrade to cell2location with a GPU for richer hierarchical modeling. For Slide-seqV2 / Stereo-seq → cell2location or STdeconvolve.

互動模擬

互動：raw spot 拆成細胞類型比例

左側為「混合 spot 表達」，右側為 deconvolution 後的細胞類型比例（pie）。拖動滑桿改變 marker 雜訊，觀察方法在低品質 reference 下的穩定度。

Left: mixed spot expression. Right: deconvolved cell-type pie. Move the slider to add marker noise and watch how performance degrades when the reference is low quality.

reference 雜訊 0.1

真實組成 50:30:20

左：spot 表達；右：估算比例

陷阱

三、常見陷阱

Reference 必須來自相似組織。用心臟 scRNA reference 去 deconvolute 腦切片是災難。
細胞類型不在 reference 裡 = 看不見。deconvolution 只能輸出 reference 中存在的類型。
分辨「proportion」與「abundance」。cell2location 輸出 abundance（每 spot 有幾顆細胞）；RCTD/SPOTlight 多輸出 proportion（比例）。下游做 marker correlate 時要分清楚。
稀有細胞容易被 zero out。要驗證稀有細胞是否真的不存在於該空間，可借助 spatial domain 對照、或用 Tangram 把 sc 細胞直接映射。

Reference must match the tissue. Deconvolving a brain section with a cardiac scRNA reference is a disaster.
Cell types absent from the reference are invisible. Deconvolution outputs only types present in the reference.
Distinguish "proportion" vs "abundance." cell2location outputs abundance (cells per spot); RCTD/SPOTlight typically output proportions. Be careful when correlating with markers downstream.
Rare cells get zeroed out easily. Cross-check with spatial domains or use Tangram to map sc cells directly.

程式碼

實作

# RCTD (spacexr)
library(spacexr)
ref <- Reference(sc_counts, sc_cell_types)
puck <- SpatialRNA(coords, vis_counts)
rctd <- create.RCTD(puck, ref, max_cores = 8)
rctd <- run.RCTD(rctd, doublet_mode = "full")
weights <- rctd@results$weights  # spot × cell-type

# CARD
library(CARD)
card <- createCARDObject(sc_count=sc_counts, sc_meta=sc_meta,
        spatial_count=vis_counts, spatial_location=coords,
        ct.varname="cellType", sample.varname="sampleID")
card <- CARD_deconvolution(card)

import cell2location as c2l
import scvi

# 1. 用 scRNA reference 學 signature
c2l.models.RegressionModel.setup_anndata(adata_sc, labels_key="cell_type", batch_key="sample")
mod = c2l.models.RegressionModel(adata_sc); mod.train(max_epochs=300)
adata_sc = mod.export_posterior(adata_sc)

# 2. 用 signature 解 spot
inf_aver = adata_sc.varm["means_per_cluster_mu_fg"]
c2l.models.Cell2location.setup_anndata(adata_st, batch_key="sample")
mod = c2l.models.Cell2location(adata_st, cell_state_df=inf_aver,
                                N_cells_per_location=8,
                                detection_alpha=20)
mod.train(max_epochs=3000); adata_st = mod.export_posterior(adata_st)
sq.pl.spatial_scatter(adata_st, color=["q05_cell_abundance_w_sf"])

📝 自我檢測

1. 一個 spot 涵蓋的細胞類型沒出現在 reference 裡，會發生什麼？

1. What happens if a spot's true cell type is missing from the reference?

A. 演算法自動發現新類型A. The algorithm discovers a new type automatically

B. 演算法自動報錯B. The algorithm errors out

C. 它會被分配到 reference 中最相似的類型 → 假陽性C. It will be allocated to the most similar reference type → false positives

D. spot 會被刪除D. The spot is removed

2. 想要 CPU-only、快速、且穩定的起點，建議？

2. CPU-only, fast, and robust starting point?

A. RCTDA. RCTD

B. cell2locationB. cell2location

C. TangramC. Tangram

D. STdeconvolveD. STdeconvolve

3. cell2location 跟 RCTD 主要輸出的差別？

3. Key output difference between cell2location and RCTD?

A. 兩者都輸出細胞數A. Both output cell counts

B. cell2location 輸出 abundance；RCTD 多輸出 proportionB. cell2location outputs abundance; RCTD typically outputs proportion

C. 兩者完全一樣C. They are identical

D. cell2location 只能 R 用D. cell2location is R-only