UMAP 是什麼?
UMAP (Uniform Manifold Approximation and Projection) 是一種非線性降維方法,將 PCA 空間中的細胞關係投影到 2D。它不是分析的核心步驟(聚類是在 PCA 空間中完成的),而是一個視覺化工具。
UMAP (Uniform Manifold Approximation and Projection) is a non-linear dimensionality reduction method that projects PCA-space cell relationships to 2D. It is not a core analysis step (clustering happens in PCA space) — it's a visualization tool.
① UMAP 圖上群體間的距離不代表生物學相似度
② 群體的大小不代表細胞數量
③ 群體的形狀可能因隨機種子而改變
這些都是非線性投影的副產品,不應用於生物學推論。 Common misinterpretation warning:
① Distances between clusters on UMAP do NOT represent biological similarity
② Cluster sizes do NOT represent cell counts
③ Cluster shapes may change with different random seeds
These are artifacts of non-linear projection — do not use for biological inference.
UMAP vs t-SNE
| UMAP | t-SNE | |
|---|---|---|
| 速度 | 較快 | 較慢 |
| 全局結構 | 較好保留 | 主要保留局部 |
| 可重複性 | 固定 seed 重現 | 較不穩定 |
| 地位 | 標準選擇 | 逐漸被取代 |
UMAP 參數的影響
n_neighbors 控制在保留局部 vs 全局結構之間的平衡;min_dist 控制點之間的最小距離。嘗試拖動兩個滑桿觀察效果。
n_neighbors balances local vs. global structure preservation; min_dist controls minimum point spacing. Try both sliders.
實作範例
pbmc <- RunUMAP(pbmc, dims = 1:15) DimPlot(pbmc, reduction = "umap", label = TRUE) # 視覺化特定基因表達 FeaturePlot(pbmc, features = c("CD3D", "MS4A1", "CD14", "GNLY")) # 調整 min.dist pbmc <- RunUMAP(pbmc, dims = 1:15, min.dist = 0.5)
sc.tl.umap(adata) sc.pl.umap(adata, color=["leiden"]) sc.pl.umap(adata, color=["CD3D", "MS4A1", "CD14"]) # 調整 min_dist sc.tl.umap(adata, min_dist=0.5)
📝 自我檢測
UMAP 上兩個 cluster 距離很近,代表?
Two clusters close together on UMAP means?