STEP 7 / 9

UMAP 視覺化

將高維資料投影到 2D 空間,便於觀察細胞群間的相對關係——但千萬別過度解讀。

Project high-dimensional data to 2D for observing cell population relationships — but don't over-interpret.

UMAP 是什麼?

UMAP (Uniform Manifold Approximation and Projection) 是一種非線性降維方法,將 PCA 空間中的細胞關係投影到 2D。它不是分析的核心步驟(聚類是在 PCA 空間中完成的),而是一個視覺化工具

UMAP (Uniform Manifold Approximation and Projection) is a non-linear dimensionality reduction method that projects PCA-space cell relationships to 2D. It is not a core analysis step (clustering happens in PCA space) — it's a visualization tool.

🚨
常見誤讀警告:
① UMAP 圖上群體間的距離不代表生物學相似度
② 群體的大小不代表細胞數量
③ 群體的形狀可能因隨機種子而改變
這些都是非線性投影的副產品,不應用於生物學推論。
Common misinterpretation warning:
① Distances between clusters on UMAP do NOT represent biological similarity
② Cluster sizes do NOT represent cell counts
③ Cluster shapes may change with different random seeds
These are artifacts of non-linear projection — do not use for biological inference.

UMAP vs t-SNE

UMAPt-SNE
速度較快較慢
全局結構較好保留主要保留局部
可重複性固定 seed 重現較不穩定
地位標準選擇逐漸被取代

UMAP 參數的影響

n_neighbors 控制在保留局部 vs 全局結構之間的平衡;min_dist 控制點之間的最小距離。嘗試拖動兩個滑桿觀察效果。

n_neighbors balances local vs. global structure preservation; min_dist controls minimum point spacing. Try both sliders.

實作範例

pbmc <- RunUMAP(pbmc, dims = 1:15)
DimPlot(pbmc, reduction = "umap", label = TRUE)
# 視覺化特定基因表達
FeaturePlot(pbmc, features = c("CD3D", "MS4A1", "CD14", "GNLY"))
# 調整 min.dist
pbmc <- RunUMAP(pbmc, dims = 1:15, min.dist = 0.5)
sc.tl.umap(adata)
sc.pl.umap(adata, color=["leiden"])
sc.pl.umap(adata, color=["CD3D", "MS4A1", "CD14"])
# 調整 min_dist
sc.tl.umap(adata, min_dist=0.5)

📝 自我檢測

UMAP 上兩個 cluster 距離很近,代表?

Two clusters close together on UMAP means?

A. 生物學上非常相似A. Biologically very similar
B. 共享大量 marker 基因B. Share many marker genes
C. 不一定——UMAP 距離不能直接解讀為相似度C. Not necessarily — UMAP distances shouldn't be interpreted as similarity
D. 應該合併為同一群D. Should be merged