differential-expression — scRNA-seq Tutorial

兩種應用

一、兩大情境

🏷️

找 Markers

比較某個 cluster 與其餘所有細胞，找出該 cluster 的特異性基因。這是細胞註釋的基礎。

Compare one cluster vs. all other cells to find cluster-specific genes. This is the foundation for cell annotation.

⚗️

跨條件比較

比較同一種細胞在不同實驗條件下（如疾病 vs 健康、藥物 vs 對照）的表達差異。這是挖掘疾病機制的核心。

Compare the same cell type under different conditions (disease vs. healthy, drug vs. control). This is the core of disease mechanism discovery.

互動模擬

二、火山圖模擬器

調整 log2FC 和 p-value 閾值，觀察有多少基因被判定為顯著差異。紅色=上調，藍色=下調，灰色=不顯著。

Adjust log2FC and p-value thresholds. Red = upregulated, blue = downregulated, gray = not significant.

|log2FC| 閾值 0.5

-log10(p_adj) 閾值 1.3

———

方法比較

三、DE 方法

方法	說明	適用
Wilcoxon	非參數秩次檢定，簡單穩健	找 cluster markers（快速）	Non-parametric rank test, simple and robust	Finding cluster markers (fast)
MAST	專為 scRNA-seq 設計的 hurdle model，處理零膨脹	跨條件比較（單樣本）	Hurdle model designed for scRNA-seq, handles zero-inflation	Cross-condition (single sample)
Pseudobulk + DESeq2	先聚合同一樣本的細胞，再用 bulk 方法分析	跨條件（目前最佳實踐）	Aggregate cells per sample first, then apply bulk methods	Cross-condition (current best practice)

🌳 該選哪種？

情境1:

找 cluster markers（註釋用）→ Wilcoxon，快速且夠用。

情境2:

跨條件比較，有多個生物重複 → Pseudobulk + DESeq2（目前最佳實踐）。

情境3:

跨條件比較，但只有 1 個樣本 → MAST，但統計效力有限。

Case 1:

Find cluster markers (for annotation) → Wilcoxon, fast and sufficient.

Case 2:

Cross-condition with biological replicates → Pseudobulk + DESeq2 (current best practice).

Case 3:

Cross-condition, single sample → MAST, but limited power.

🚨

常見嚴重錯誤：直接用 FindMarkers 做跨條件比較（把每個細胞當獨立觀察值）。這會嚴重膨脹樣本量，產生大量假陽性。同一個體的所有細胞並非獨立觀察——正確做法是先做 pseudobulk 聚合，以生物重複為分析單位。 Critical common error: Using FindMarkers directly for cross-condition comparison (treating each cell as independent). This severely inflates sample size, producing many false positives. Cells from the same individual are NOT independent — use pseudobulk aggregation with biological replicates as the analysis unit.

程式碼

四、實作範例

# 情境 1：找 Cluster 0 的 markers
cl0_markers <- FindMarkers(pbmc, ident.1 = 0, only.pos = TRUE)

# 情境 2：Pseudobulk 跨條件比較（正確做法）
pseudo <- AggregateExpression(pbmc,
  group.by = c("cell_type", "sample_id"),
  return.seurat = TRUE)
de <- FindMarkers(pseudo,
  ident.1 = "disease", ident.2 = "control",
  test.use = "DESeq2")

# 火山圖
library(EnhancedVolcano)
EnhancedVolcano(de, lab = rownames(de), x = "avg_log2FC", y = "p_val_adj")

# 情境 1：找各 cluster markers
sc.tl.rank_genes_groups(adata, groupby="leiden", method="wilcoxon")
sc.pl.rank_genes_groups(adata, n_genes=10)

# 情境 2：Pseudobulk（推薦用 decoupler）
import decoupler as dc
pdata = dc.get_pseudobulk(adata, sample_col="sample_id",
  groups_col="cell_type", min_cells=10)
dc.deseq2(pdata, design="~condition")

📝 自我檢測

跨條件比較為何不應直接對所有單細胞做檢定？

Why shouldn't you test all individual cells for cross-condition DE?

A. 計算太慢A. Too slow computationally

B. 無法處理零值B. Can't handle zeros

C. 會把同一個體的多個細胞當獨立觀察，膨脹樣本量導致假陽性C. Treats cells from same individual as independent, inflating sample size and false positives

D. Seurat 不支援此功能D. Not supported by Seurat