STEP 8 / 15

細胞類型解卷積:把 spot 拆回細胞

每個 Visium spot 約 1–10 個細胞,混在一起的表達需要藉助 scRNA reference 拆解。

Each Visium spot mixes 1–10 cells; an scRNA reference is required to unmix them.

一、為什麼需要 deconvolution?

在 spot-based 平台,一個 spot 的表達向量是其涵蓋細胞的表達線性混合。如果直接用 spot 表達分群,會看到「不純」的 cluster——例如腫瘤邊界區同時混了腫瘤細胞 + 巨噬細胞 + 內皮細胞。

Deconvolution 的目標:在已知 scRNA cell-type signature 的前提下,估算每個 spot 上各細胞類型的比例 (proportion / abundance)。輸出通常是一個 spot × cell-type 的矩陣。

On spot-based platforms, a spot's expression is a linear mixture of its underlying cells. Clustering spot expression directly produces "impure" clusters — e.g. a tumor margin mixes tumor + macrophage + endothelial cells.

Deconvolution aims, given an scRNA cell-type signature, to estimate the proportion (or abundance) of each cell type per spot. Output: a spot × cell-type matrix.

互動:raw spot 拆成細胞類型比例

左側為「混合 spot 表達」,右側為 deconvolution 後的細胞類型比例(pie)。拖動滑桿改變 marker 雜訊,觀察方法在低品質 reference 下的穩定度。

Left: mixed spot expression. Right: deconvolved cell-type pie. Move the slider to add marker noise and watch how performance degrades when the reference is low quality.

左:spot 表達;右:估算比例

三、常見陷阱

  • Reference 必須來自相似組織。用心臟 scRNA reference 去 deconvolute 腦切片是災難。
  • 細胞類型不在 reference 裡 = 看不見。deconvolution 只能輸出 reference 中存在的類型。
  • 分辨「proportion」與「abundance」。cell2location 輸出 abundance(每 spot 有幾顆細胞);RCTD/SPOTlight 多輸出 proportion(比例)。下游做 marker correlate 時要分清楚。
  • 稀有細胞容易被 zero out。要驗證稀有細胞是否真的不存在於該空間,可借助 spatial domain 對照、或用 Tangram 把 sc 細胞直接映射。
  • Reference must match the tissue. Deconvolving a brain section with a cardiac scRNA reference is a disaster.
  • Cell types absent from the reference are invisible. Deconvolution outputs only types present in the reference.
  • Distinguish "proportion" vs "abundance." cell2location outputs abundance (cells per spot); RCTD/SPOTlight typically output proportions. Be careful when correlating with markers downstream.
  • Rare cells get zeroed out easily. Cross-check with spatial domains or use Tangram to map sc cells directly.

實作

# RCTD (spacexr)
library(spacexr)
ref <- Reference(sc_counts, sc_cell_types)
puck <- SpatialRNA(coords, vis_counts)
rctd <- create.RCTD(puck, ref, max_cores = 8)
rctd <- run.RCTD(rctd, doublet_mode = "full")
weights <- rctd@results$weights  # spot × cell-type

# CARD
library(CARD)
card <- createCARDObject(sc_count=sc_counts, sc_meta=sc_meta,
        spatial_count=vis_counts, spatial_location=coords,
        ct.varname="cellType", sample.varname="sampleID")
card <- CARD_deconvolution(card)
import cell2location as c2l
import scvi

# 1. 用 scRNA reference 學 signature
c2l.models.RegressionModel.setup_anndata(adata_sc, labels_key="cell_type", batch_key="sample")
mod = c2l.models.RegressionModel(adata_sc); mod.train(max_epochs=300)
adata_sc = mod.export_posterior(adata_sc)

# 2. 用 signature 解 spot
inf_aver = adata_sc.varm["means_per_cluster_mu_fg"]
c2l.models.Cell2location.setup_anndata(adata_st, batch_key="sample")
mod = c2l.models.Cell2location(adata_st, cell_state_df=inf_aver,
                                N_cells_per_location=8,
                                detection_alpha=20)
mod.train(max_epochs=3000); adata_st = mod.export_posterior(adata_st)
sq.pl.spatial_scatter(adata_st, color=["q05_cell_abundance_w_sf"])

📝 自我檢測

1. 一個 spot 涵蓋的細胞類型沒出現在 reference 裡,會發生什麼?

1. What happens if a spot's true cell type is missing from the reference?

A. 演算法自動發現新類型A. The algorithm discovers a new type automatically
B. 演算法自動報錯B. The algorithm errors out
C. 它會被分配到 reference 中最相似的類型 → 假陽性C. It will be allocated to the most similar reference type → false positives
D. spot 會被刪除D. The spot is removed

2. 想要 CPU-only、快速、且穩定的起點,建議?

2. CPU-only, fast, and robust starting point?

A. RCTDA. RCTD
B. cell2locationB. cell2location
C. TangramC. Tangram
D. STdeconvolveD. STdeconvolve

3. cell2location 跟 RCTD 主要輸出的差別?

3. Key output difference between cell2location and RCTD?

A. 兩者都輸出細胞數A. Both output cell counts
B. cell2location 輸出 abundance;RCTD 多輸出 proportionB. cell2location outputs abundance; RCTD typically outputs proportion
C. 兩者完全一樣C. They are identical
D. cell2location 只能 R 用D. cell2location is R-only