一、兩大情境
找 Markers
比較某個 cluster 與其餘所有細胞,找出該 cluster 的特異性基因。這是細胞註釋的基礎。
Compare one cluster vs. all other cells to find cluster-specific genes. This is the foundation for cell annotation.
跨條件比較
比較同一種細胞在不同實驗條件下(如疾病 vs 健康、藥物 vs 對照)的表達差異。這是挖掘疾病機制的核心。
Compare the same cell type under different conditions (disease vs. healthy, drug vs. control). This is the core of disease mechanism discovery.
二、火山圖模擬器
調整 log2FC 和 p-value 閾值,觀察有多少基因被判定為顯著差異。紅色=上調,藍色=下調,灰色=不顯著。
Adjust log2FC and p-value thresholds. Red = upregulated, blue = downregulated, gray = not significant.
三、DE 方法
| 方法 | 說明 | 適用 | ||
|---|---|---|---|---|
| Wilcoxon | 非參數秩次檢定,簡單穩健 | 找 cluster markers(快速) | Non-parametric rank test, simple and robust | Finding cluster markers (fast) |
| MAST | 專為 scRNA-seq 設計的 hurdle model,處理零膨脹 | 跨條件比較(單樣本) | Hurdle model designed for scRNA-seq, handles zero-inflation | Cross-condition (single sample) |
| Pseudobulk + DESeq2 | 先聚合同一樣本的細胞,再用 bulk 方法分析 | 跨條件(目前最佳實踐) | Aggregate cells per sample first, then apply bulk methods | Cross-condition (current best practice) |
🌳 該選哪種?
FindMarkers 做跨條件比較(把每個細胞當獨立觀察值)。這會嚴重膨脹樣本量,產生大量假陽性。同一個體的所有細胞並非獨立觀察——正確做法是先做 pseudobulk 聚合,以生物重複為分析單位。
Critical common error: Using FindMarkers directly for cross-condition comparison (treating each cell as independent). This severely inflates sample size, producing many false positives. Cells from the same individual are NOT independent — use pseudobulk aggregation with biological replicates as the analysis unit.
四、實作範例
# 情境 1:找 Cluster 0 的 markers cl0_markers <- FindMarkers(pbmc, ident.1 = 0, only.pos = TRUE) # 情境 2:Pseudobulk 跨條件比較(正確做法) pseudo <- AggregateExpression(pbmc, group.by = c("cell_type", "sample_id"), return.seurat = TRUE) de <- FindMarkers(pseudo, ident.1 = "disease", ident.2 = "control", test.use = "DESeq2") # 火山圖 library(EnhancedVolcano) EnhancedVolcano(de, lab = rownames(de), x = "avg_log2FC", y = "p_val_adj")
# 情境 1:找各 cluster markers sc.tl.rank_genes_groups(adata, groupby="leiden", method="wilcoxon") sc.pl.rank_genes_groups(adata, n_genes=10) # 情境 2:Pseudobulk(推薦用 decoupler) import decoupler as dc pdata = dc.get_pseudobulk(adata, sample_col="sample_id", groups_col="cell_type", min_cells=10) dc.deseq2(pdata, design="~condition")
📝 自我檢測
跨條件比較為何不應直接對所有單細胞做檢定?
Why shouldn't you test all individual cells for cross-condition DE?