如何使用這份資料?
本頁針對教學中提到的每個工具、演算法與生物學概念,整理學術出處供讀者深入查閱。引用標籤含義:
For every tool, algorithm, and biological concept mentioned in the tutorial, this page collects academic sources for deeper reading. Citation tag meanings:
Paper
原始論文 · 含 DOI / PubMed
Original papers with DOI / PubMed
Doc
官方文件、vignette、tutorial
Official documentation, vignettes, tutorials
Best Practice
系統性綜述或 community 推薦
Systematic reviews and community recommendations
Benchmark
方法評比 / 獨立 benchmarking
Method comparisons and independent benchmarks
Database
Marker gene / 細胞註釋資料庫
Marker gene and cell annotation databases
Book
線上免費書籍與綜合教材
Free online books and comprehensive textbooks
本頁目錄
⭐ Best Practices 綜述論文
整體 scRNA-seq 分析流程的權威性教科書級綜述。建議從這兩篇開始閱讀,能對全流程的設計取捨建立全景觀。
Authoritative textbook-level reviews of the entire scRNA-seq analysis workflow. Start with these two papers to build a panoramic view of design tradeoffs across the full pipeline.
- ★ BEST Best practices for single-cell analysis across modalities. Nature Reviews Genetics 24, 550–572 (2023).
- ★ BEST Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 15, e8746 (2019).
- 📚 BOOK Orchestrating Single-Cell Analysis with Bioconductor (OSCA). Nat Methods 17, 137–145 (2020).
- ⭐ ATLAS The Human Cell Atlas. eLife 6:e27041 (2017).
🧰 核心分析框架
scRNA-seq 流程兩大主流軟體生態系。Seurat (R) 與 Scanpy (Python) 在功能上高度相容,多數方法兩端都有實作。
The two mainstream software ecosystems for scRNA-seq workflows. Seurat (R) and Scanpy (Python) are highly feature-compatible, with most methods implemented on both sides.
- PAPER Integrated analysis of multimodal single-cell data. Cell 184(13):3573–3587.e29 (2021). (Seurat v4)
- PAPER Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol 42:293–304 (2024). (Seurat v5)
- DOC Seurat 官方網站與 vignettes(Satija Lab)。
- PAPER SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19:15 (2018).
- DOC Scanpy 官方文件(scverse / Theis Lab)。
🧪 品質管控 (Quality Control)
細胞層級指標 (nFeature/nCount/percent.mt)、基因層級過濾,以及進階的 doublet 偵測與 ambient RNA 移除。
Cell-level metrics (nFeature/nCount/percent.mt), gene-level filtering, plus advanced doublet detection and ambient RNA removal.
Doublet 偵測
- PAPER DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Syst 8(4):329–337.e4 (2019).
- PAPER Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Syst 8(4):281–291.e9 (2019).
- PAPER Doublet identification in single-cell sequencing data using scDblFinder. F1000Research 10:979 (2021).
- BENCH Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data. Cell Syst 12(2):176–194.e6 (2021).
Ambient RNA / Soup
- PAPER SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. GigaScience 9(12):giaa151 (2020).
- PAPER Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender. Nat Methods 20:1323–1335 (2023).
自適應閾值與 QC 工具
- PAPER Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33(8):1179–1186 (2017).
- ⭐ TIP Best practices 章節「Quality control」:
⚖️ 標準化 (Normalization)
- PAPER · LogNormalize Comprehensive Integration of Single-Cell Data. Cell 177(7):1888–1902.e21 (2019). (Seurat v3 引文)
- PAPER · scran 池化解卷積 Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol 17:75 (2016).
- PAPER · SCTransform Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20:296 (2019).
- PAPER · SCT v2 Comparison and evaluation of statistical error models for scRNA-seq. Genome Biol 23:27 (2022).
- BENCH Comparison of transformations for single-cell RNA-seq data. Nat Methods 20:665–672 (2023).
✨ 高變異基因選擇 (Variable Features)
-
DOC
Seurat
FindVariableFeatures()· vst / mvp / dispersion 三種方法。SeuratFindVariableFeatures()· vst / mvp / dispersion three methods. -
DOC
Scanpy
sc.pp.highly_variable_genes()· 提供 cell_ranger / seurat / seurat_v3 三種策略。Scanpysc.pp.highly_variable_genes()· offers cell_ranger / seurat / seurat_v3 strategies. - PAPER · Brennecke (CV² 方法) Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods 10:1093–1095 (2013).
- PAPER · scry / Pearson residual HVG Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 20:295 (2019).
📏 縮放 (Scaling)
-
DOC
Seurat
ScaleData()· z-score、vars.to.regress線性回歸殘差。SeuratScaleData()· z-score andvars.to.regresslinear regression residuals. -
DOC
Scanpy
sc.pp.scale()· 預設裁切 max_value,避免極端值主導。Scanpysc.pp.scale()· clips at max_value by default to prevent extreme values from dominating. -
PAPER · 細胞週期分數回歸
Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq.
Science 352(6282):189–196 (2016). (Seurat
CellCycleScoring來源)
📉 主成分分析 (PCA)
-
PAPER · IRLBA 加速 PCA
Augmented implicitly restarted Lanczos bidiagonalization methods.
SIAM J Sci Comput 27(1):19–42 (2005). (Seurat
RunPCA預設方法) -
DOC
Seurat
RunPCA()與ElbowPlot()、JackStraw()文件。SeuratRunPCA()withElbowPlot()andJackStraw()documentation. - ⭐ TIP Best practices 章節「Dimensionality reduction」對 nPCs 選擇有完整討論。
🧩 聚類 (Clustering)
- PAPER · Louvain Fast unfolding of communities in large networks. J Stat Mech 2008(10):P10008.
- PAPER · Leiden(推薦) From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9:5233 (2019).
- PAPER · SNN(PhenoGraph) Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162(1):184–197 (2015). (SNN 起源於 PhenoGraph)
- BENCH A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Research 7:1141 (2018).
🗺️ UMAP / t-SNE 視覺化
- PAPER · UMAP UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 (2018).
- PAPER · scRNA-seq 中的 UMAP Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37:38–44 (2019).
- PAPER · t-SNE Visualizing Data using t-SNE. J Mach Learn Res 9:2579–2605 (2008).
- PAPER · UMAP 不過度解讀的提醒 The specious art of single-cell genomics. PLoS Comput Biol 19(8):e1011288 (2023).
🏷️ 細胞註釋 (Cell Annotation)
- PAPER · SingleR Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol 20:163–172 (2019).
- PAPER · CellTypist Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376(6594):eabl5197 (2022).
- DOC · Azimuth Azimuth web app(reference-based mapping,使用 Seurat v4 中的方法)。引用基於 Hao et al. 2021 Cell。
- BENCH A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol 20:194 (2019).
- ⭐ TIP Best practices 章節「Annotation」流程與工具選擇。
🧾 差異表達分析 (Differential Expression)
- PAPER · MAST hurdle model MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol 16:278 (2015).
- PAPER · DESeq2 Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550 (2014).
- PAPER · edgeR edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140 (2010).
- PAPER · pseudobulk 推薦做法 Confronting false discoveries in single-cell differential expression. Nat Commun 12:5692 (2021).
- PAPER · pseudo-replication 警示 A practical solution to pseudoreplication bias in single-cell studies. Nat Commun 12:738 (2021).
- PAPER · muscat(multi-sample DE) muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat Commun 11:6077 (2020).
🔗 資料整合 (Integration)
- PAPER · Harmony Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16:1289–1296 (2019).
- PAPER · Seurat CCA / Anchors Comprehensive Integration of Single-Cell Data. Cell 177(7):1888–1902.e21 (2019).
- PAPER · scVI Deep generative modeling for single-cell transcriptomics. Nat Methods 15:1053–1058 (2018).
- PAPER · scanorama Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat Biotechnol 37:685–691 (2019).
- BENCH · 整合方法評比 Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19:41–50 (2022).
- PAPER · scIB / scIB-metrics A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol 21:12 (2020).
💬 細胞通訊 (Cell-Cell Communication)
- PAPER · CellChat Inference and analysis of cell-cell communication using CellChat. Nat Commun 12:1088 (2021).
- PAPER · CellChat 跨條件分析(Protocol) CellChat for systematic analysis of cell–cell communication from single-cell transcriptomics. Nat Protoc 20:180–219 (2025).
- PAPER · CellPhoneDB CellPhoneDB: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nat Protoc 15:1484–1506 (2020).
- PAPER · NicheNet NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 17:159–162 (2020).
- PAPER · LIANA(共識框架) Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data. Nat Commun 13:3224 (2022).
🛤️ 軌跡分析 (Trajectory / Pseudotime)
- PAPER · Monocle3 The single-cell transcriptional landscape of mammalian organogenesis. Nature 566:496–502 (2019).
- PAPER · Slingshot Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19:477 (2018).
- PAPER · RNA Velocity 起源 RNA velocity of single cells. Nature 560:494–498 (2018).
- PAPER · scVelo(dynamical model) Generalizing RNA velocity to transient cell states through dynamical modeling. Nat Biotechnol 38:1408–1414 (2020).
- PAPER · CytoTRACE Single-cell transcriptional diversity is a hallmark of developmental potential. Science 367(6476):405–411 (2020).
- PAPER · PAGA PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol 20:59 (2019).
- BENCH · 軌跡方法 41 種比較 A comparison of single-cell trajectory inference methods. Nat Biotechnol 37:547–554 (2019).
🗄️ Marker Gene 資料庫與細胞圖譜
原教學「Annotation」章節示範的 PBMC marker(CD3D、IL7R、MS4A1、CD79A、GNLY、NKG7、KLRD1、CD14、LYZ、S100A8、FCGR3A、MS4A7、FCER1A、CST3、PPBP、PF4 等)在以下資料庫中可被追溯到原始文獻。
The PBMC markers demonstrated in the tutorial’s “Annotation” chapter (CD3D, IL7R, MS4A1, CD79A, GNLY, NKG7, KLRD1, CD14, LYZ, S100A8, FCGR3A, MS4A7, FCER1A, CST3, PPBP, PF4, etc.) can be traced back to their original literature in the databases below.
- DB · PanglaoDB PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford) baz046 (2019).
- DB · CellMarker 2.0 CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res 51(D1):D870–D876 (2023).
- DB · Tabula Sapiens The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans. Science 376(6594):eabl4896 (2022).
- DB · Human Cell Atlas Human Cell Atlas Data Portal · 跨組織人類細胞圖譜(含 Tabula Sapiens、Lung Cell Atlas 等子計畫)。
- DB · CZ CELLxGENE Discover CZ CELLxGENE · Chan Zuckerberg Initiative 推出的單細胞探索與資料庫整合平台。
- DOC · 10x Genomics PBMC 標誌基因 10x Genomics PBMC datasets · Seurat / Scanpy PBMC tutorial 使用的標準訓練資料。
- DB · Azimuth References Azimuth · HuBMAP × Satija Lab 多器官 reference 集(PBMC、Kidney、Lung、Pancreas 等),可一鍵 mapping。
📌 教學註記與細節
下方為閱讀本 scRNA-seq 教學 HTML 與本 reference 比對後,發現的可能不完整、易誤解或可加強之處。不修改原教學檔案,僅在此說明以利參照。
Below are points discovered while cross-checking the tutorial HTML against this reference list — places that may be incomplete, easily misread, or worth expanding. The original tutorial files are not modified; clarifications are provided here for cross-reference.
Normalization:SCTransform 不一定最佳
教學若預設推薦 SCTransform (Hafemeister & Satija 2019) 為「最佳」normalization,需修正。Ahlmann-Eltze & Huber (2023, Nat Methods) 系統 benchmark 22 種 scRNA-seq normalization 方法,於下游 cluster 重現性、DE、軌跡分析比較,結論:(1) 簡單的 shifted log (log(y/s + 1)) 搭配 size-factor(如 scran::computeSumFactors)在多數任務表現等同或優於 SCTransform / GLM-PCA / scVI 等複雜方法;(2) Pearson residuals (SCT v2) 在 highly variable gene 篩選有優勢,但全矩陣輸出可能引入 spurious negative values;(3) 不存在「one-size-fits-all」normalization。實務:(a) 探索階段直接用 NormalizeData() shifted log 即可;(b) 若 sequencing depth 差異 >10× 才加 size-factor regression;(c) 跨方法交叉驗證 cluster stability。
Ahlmann-Eltze & Huber (2023, Nat Methods) benchmarked 22 normalization methods and found simple shifted log with size factors matches or beats SCTransform / GLM-PCA / scVI on cluster reproducibility, DE, and trajectory tasks. There is no universally best normalization; cross-validate cluster stability.
Sources: Ahlmann-Eltze C, Huber W (2023) Comparison of transformations for single-cell RNA-seq data, Nat Methods 20:665–672. DOI: 10.1038/s41592-023-01814-1; Hafemeister C, Satija R (2019) SCTransform, Genome Biology 20:296.
Clustering:k.param=20 的影響
教學常逕用 FindNeighbors(..., k.param=20) + FindClusters(resolution=0.5) 為預設,需注意此選擇 偏向中等 granularity:(1) k 過小(<10)會放大 noise,產生過多 micro-clusters;(2) k 過大(>50)會吞噬 rare cell types(如 plasmacytoid DC、Tregs);(3) 小資料集(<1000 細胞)20 已接近全體,cluster 變得不穩;(4) 不同資料尺度應 sweep k 與 resolution,配合 silhouette、ROGUE 或 clustree 評估穩定性。實務:先以 clustree::clustree() 看 resolution 階梯,找到「stable plateau」;再以 scran::clusterCells() 或 bluster::clusterRows() 比較不同 k 的 cluster purity。
Default k.param=20 in Seurat is not neutral — too small inflates micro-clusters, too large merges rare populations. Sweep k and resolution, evaluate with clustree / silhouette / ROGUE; on small datasets k=20 can equal the cell count.
Sources: Hao Y et al. (2024) Seurat v5, Nat Biotechnol 42:293–304. DOI: 10.1038/s41587-023-01767-y; Zappia L, Oshlack A (2018) clustree, GigaScience 7:giy083; Liu et al. (2020) ROGUE, Nat Commun 11:3155.
Doublet:工具選擇與基準
教學若僅推介 DoubletFinder (McGinnis 2019),需補充更新證據。Xi & Li (2021, Cell Systems) 系統 benchmark 9 種 doublet detection 工具於 16 個 ground-truth 資料集,結論:(1) scDblFinder (Germain 2021, F1000Research) 在準確度、執行速度、跨資料穩定性上總體最佳;(2) DoubletFinder 表現次之,但需手動估 expected rate;(3) Scrublet (Wolock 2019) 速度快但 recall 較低;(4) 不同工具偵到的 doublets 重疊有限,建議 ≥2 工具交集使用;(5) 任何工具均無法偵到 homotypic doublets(同型細胞融合),仍需以 nCount_RNA + nFeature_RNA 上限過濾。實務:scDblFinder 為首選,並注意 10x Multiome、Visium HD 等多模態資料須用模態專用 doublet 偵測(如 AMULET for ATAC)。
Xi & Li (2021, Cell Systems) benchmarked 9 doublet callers across 16 ground-truth datasets: scDblFinder ranked best overall, DoubletFinder second, Scrublet fastest but lower recall. Use consensus of ≥2 callers; no tool detects homotypic doublets — combine with nCount/nFeature filters.
Sources: Xi NM, Li JJ (2021) Benchmarking computational doublet-detection methods, Cell Systems 12:176–194. DOI: 10.1016/j.cels.2020.11.008; Germain PL et al. (2021) scDblFinder, F1000Research 10:979.
Pseudotime 的本質與誤用
教學常呈現 Monocle3 / Slingshot / PAGA / scVelo 的 trajectory 並暗示「細胞發育順序」,需釐清:(1) Pseudotime 為 reconstruction,不是 ground truth,只反映「樣本內表達相似性的拓撲順序」,不必對應實際時間軸;(2) Saelens et al. (2019, Nat Biotechnol) 系統 benchmark 45 種 trajectory inference 工具,發現不同方法在同一資料可推得迥異拓撲;(3) RNA velocity (La Manno 2018; Bergen 2020 scVelo) 雖加入動力學,但 Bergen et al. (2021, Mol Syst Biol) 指出 splicing kinetics 假設在許多 lineage 違反(如 erythroid maturation),可能反推錯誤方向;(4) 不同 root cell 選擇即決定 pseudotime 方向。實務:報告 trajectory 須 (a) 同時測 ≥2 工具確認 topology 一致、(b) 用 lineage tracing / scNT-seq / sci-fate 等實驗驗證、(c) 避免 over-interpret 「leaves」為終末分化態。
Pseudotime is a reconstruction, not a true time axis. Saelens et al. (2019, Nat Biotechnol) showed different tools give different topologies on the same data. RNA velocity assumptions (constant splicing/degradation rates) are violated in erythroid maturation (Bergen 2021, Mol Syst Biol). Validate with lineage tracing or metabolic-labeling sequencing.
Sources: Saelens W et al. (2019) A comparison of single-cell trajectory inference methods, Nat Biotechnol 37:547–554. DOI: 10.1038/s41587-019-0071-9; Bergen V et al. (2021) RNA velocity—current challenges, Mol Syst Biol 17:e10282.
Annotation:自動化與專家審查的平衡
教學若僅推介單一 annotation 工具,需平衡介紹三類策略:(1) Marker-based 手動:以 dotplot/violin 對照 PanglaoDB、CellMarker 2.0 (Hu 2023, NAR)、Azimuth references,優點可解讀、可控;缺點主觀且 rare cell type 易漏。(2) SingleR / scmap / Symphony(reference-based 相關係數):快速、需 bulk 或 scRNA reference;對 reference 不涵蓋的 cell type 會強制錯誤分類。(3) scANVI / Celltypist (Domínguez Conde 2022, Science)(probabilistic deep model):可輸出不確定性、支援 zero-shot 與 fine-tune;需 GPU。Tan & Cahan (2019, Cell Systems) 與 Abdelaal et al. (2019, Genome Biology) benchmark 顯示 沒有單一最佳工具,跨方法 majority vote 表現最穩。實務:(a) 任何自動 annotation 都應以 marker gene + 領域專家審查;(b) 報告中需列出 reference 版本、unmapped cell 比例、低 confidence 比例。
No single annotation tool dominates. Abdelaal et al. (2019, Genome Biology) and Tan & Cahan (2019, Cell Systems) recommend cross-method majority vote followed by marker review. SingleR is fast and reference-bound; scANVI/Celltypist give probabilistic outputs; manual marker review remains indispensable.
Sources: Abdelaal T et al. (2019) A comparison of automatic cell identification methods for single-cell RNA sequencing data, Genome Biology 20:194. DOI: 10.1186/s13059-019-1795-z; Domínguez Conde C et al. (2022) Celltypist, Science 376:eabl5197.
Integration:方法選擇與 over-correction
教學若預設 Harmony 或 Seurat CCA 為「萬用」integration,需澄清三大主流的取捨。Luecken et al. (2022, Nat Methods) scIB benchmark 16 種 integration 方法,於 13 個任務評估 batch removal vs biological conservation:(1) Harmony (Korsunsky 2019):PCA-空間 fast soft-clustering,適合中等規模、組成差異小;過度校正風險高。(2) scVI / scANVI (Lopez 2018; Xu 2021):VAE 模型,最佳「保留生物變異 + 校正 batch」綜合分;但需 GPU、訓練時間長、超參數敏感。(3) Seurat CCA / RPCA:基於 anchor,CCA 較 sensitive 但 over-correction 易;RPCA 為 v4+ 推薦折衷。(4) BBKNN:快速 KNN 校正,適合 cell-atlas 規模。實務:(a) 用 scIB metrics(kBET、iLISI、ARI)量化評估;(b) 警惕「批次差異 = 生物學差異」的情境(如疾病 vs 對照來自不同批次),integration 會抹除真實效應;(c) 報告須明列方法、版本、batch covariate。
Luecken et al. (2022, Nat Methods) scIB benchmark: scVI/scANVI top overall for biology conservation; Harmony fastest with over-correction risk; Seurat CCA sensitive but tends to over-correct (use RPCA). Quantify with kBET/iLISI/ARI. Watch for confounded batch × biology designs where integration removes real effects.
Sources: Luecken MD et al. (2022) Benchmarking atlas-level data integration in single-cell genomics, Nat Methods 19:41–50. DOI: 10.1038/s41592-021-01336-8; Korsunsky I et al. (2019) Harmony, Nat Methods 16:1289.
UMAP:誤把幾何當生物學
教學若以 UMAP cluster 間距推論「細胞型別相關性」,需強烈警告。Chari & Pachter (2023, PLOS Comput Biol) The specious art of single-cell genomics 系統論證:(1) UMAP / t-SNE 將高維點映射到 2D 必然 扭曲全域結構,cluster 間「距離」幾乎與真實高維距離無關;(2) 用同樣資料隨機 seed 可得到拓撲不同的 UMAP;(3) UMAP「分支」常被誤讀為發育軌跡,實為投影 artifact;(4) Pearson correlation between UMAP distance and true distance 常 <0.3。實務:(a) UMAP 僅作 local neighborhood 視覺化,不要用 cluster 間距論親緣;(b) 軌跡分析應在 PCA 或 diffusion map 空間進行;(c) 報告應 (i) 公開隨機 seed、(ii) 同時呈現 PCA pairs 圖、(iii) 不在 UMAP 上做統計推論。Kobak & Linderman (2021, Nat Biotechnol) 提出 PCA-initialized t-SNE/UMAP 可改善 reproducibility,但仍不解決全域距離問題。
Chari & Pachter (2023, PLOS Comput Biol) demonstrate UMAP/t-SNE inherently distort global structure; inter-cluster distances do not reflect biology and can change with random seed. Use UMAP only for local neighborhood viz; perform trajectory and statistics in PCA / diffusion-map space; always disclose seeds and complement with PCA pair plots.
Sources: Chari T, Pachter L (2023) The specious art of single-cell genomics, PLOS Comput Biol 19:e1011288. DOI: 10.1371/journal.pcbi.1011288; Kobak D, Linderman GC (2021) Initialization is critical for preserving global data structure in both t-SNE and UMAP, Nat Biotechnol 39:156.
Doublet detection:Xi & Li 2021 Cell Systems benchmark
教學在 QC 章節介紹 doublet 偵測時,常只列工具名稱(Scrublet, DoubletFinder, scDblFinder)而未給出選擇依據。Xi & Li 2021 Cell Systems「Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data」系統比較 9 種 doublet 偵測工具於 16 真實+模擬資料集:DoubletFinder 與 scDblFinder 在 AUPRC 兩端均居前列;Scrublet 速度最快但對 inter-cell-type doublet 召回較弱;intra-cell-type doublet(同類細胞合併)所有工具皆難偵測。實務原則:(1) 推薦 scDblFinder (Germain 2022 F1000Research) 作為預設,因其同時用 cluster-based 與 random doublet simulation;(2) 預期 doublet rate 應依 10x 公佈值(~0.4% per 1000 cells loaded)設置;(3) 偵測前後務必比較 nFeature / nCount 分布;(4) 跨樣本聯合 doublet 偵測可降低 batch 干擾。
When the QC chapter introduces doublet detection, it often lists tools (Scrublet, DoubletFinder, scDblFinder) without selection criteria. Xi & Li 2021 Cell Systems ('Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data') benchmarks 9 tools on 16 real+simulated datasets: DoubletFinder and scDblFinder lead AUPRC on both ends; Scrublet is fastest but weaker on inter-cell-type doublets; intra-cell-type doublets (same-type merges) are hard for all tools. Rules: (1) recommend scDblFinder (Germain 2022 F1000Research) as default — it combines cluster-based and random doublet simulation; (2) set expected doublet rate per 10x's published curve (~0.4% per 1,000 cells loaded); (3) always compare nFeature / nCount distributions before/after; (4) joint doublet detection across samples reduces batch interference.
Tran 2020 Genome Biology:batch correction 在無真 batch 時會傷害訊號
教學在 integration 章節常以「先 batch correct 再分析」作為標配,需補充警告:Tran, Ang, Chevrier, Zhang, Lee, Goh, Chen 2020 Genome Biology「A benchmark of batch-effect correction methods for single-cell RNA sequencing data」比較 14 種整合方法(Harmony, scVI, BBKNN, Seurat v3 CCA, MNN, Scanorama, Liger 等)於 5 個情境後指出:(1) 無真 batch(同實驗、同 platform)時做 over-correction,部分方法(如 BBKNN 預設參數)會把不同細胞類型「壓平」進同一 cluster;(2) Harmony 與 Seurat v3 在大多情境表現穩健但 hyperparameter 敏感;(3) scVI 在大 batch 效應強時優勢明顯但小批次易 overfit。實務原則:(1) 先以 PCA / UMAP on raw 資料評估 batch 效應強度;(2) 若 silhouette by batch < 0.05 且 cell types 已分群,可不做 batch correction;(3) Luecken 2022 Nat Methods 的 scIB-metrics 同時報 batch removal 與 bio conservation 雙軸;(4) 報告校正前後 marker gene 表現是否被壓抑。
When the integration chapter teaches 'batch-correct first, then analyse', add a caution: Tran, Ang, Chevrier, Zhang, Lee, Goh, Chen 2020 Genome Biology ('A benchmark of batch-effect correction methods for single-cell RNA sequencing data') compares 14 methods (Harmony, scVI, BBKNN, Seurat v3 CCA, MNN, Scanorama, Liger, etc.) across 5 scenarios and finds: (1) over-correction when no real batch exists (same experiment, same platform) — some methods (BBKNN at defaults) collapse distinct cell types into one cluster; (2) Harmony and Seurat v3 are robust across most scenarios but hyperparameter-sensitive; (3) scVI excels under strong batch effects but overfits on small batches. Rules: (1) first inspect batch strength on raw PCA / UMAP; (2) if silhouette-by-batch < 0.05 and cell types separate cleanly, skip correction; (3) Luecken 2022 Nat Methods's scIB-metrics reports batch removal and biology conservation on separate axes; (4) verify that marker-gene expression is not suppressed before/after correction.
QC:DoubletFinder API 名稱
教學中若仍示範 doubletFinder_v3(),需更新:此為 DoubletFinder < 2.0.4 的舊 API。自 2023 年發行的 DoubletFinder 2.0.4 起,函式已正名為 doubletFinder() 與 paramSweep()(不再帶 _v3 後綴)。新版本仍向後相容舊名稱,但建議讀者使用新 API;同時對應的參數(pN、pK、nExp)介面未變。
If the tutorial still demonstrates doubletFinder_v3(), update it: that is the pre-2.0.4 API. Since DoubletFinder 2.0.4 (2023) the functions are renamed doubletFinder() and paramSweep() (no _v3 suffix). Old names still work for backward compatibility, but new code should use the new API. Parameters (pN, pK, nExp) are unchanged.
Sources: GitHub: chris-mcginnis-ucsf/DoubletFinder README & release notes (v2.0.4, 2023).
Annotation:FCGR3A Mono 的常用名
教學表格若寫「FCGR3A Mono」,需補充:FCGR3A 即 CD16。在多數教科書、Seurat 官方 PBMC tutorial 與 Azimuth PBMC reference 中,此細胞群通常稱為 CD16+ Monocyte 或 非典型單核球 (non-classical monocyte)。三個名稱(FCGR3A Mono / CD16+ Mono / non-classical Mono)指向同一細胞群,文獻可互換;報告時建議使用 CD16+ Monocyte 以對齊免疫學社群慣例。
If the tutorial table lists "FCGR3A Mono", note that FCGR3A is CD16. Textbooks, the Seurat PBMC tutorial, and the Azimuth PBMC reference all use CD16+ Monocyte or non-classical monocyte for the same population. The three names are interchangeable; prefer CD16+ Monocyte in reports to align with immunology conventions.
Sources: Hao Y et al. (2021) Integrated analysis of multimodal single-cell data, Cell 184:3573 (Azimuth PBMC reference); Seurat PBMC3K vignette.
Annotation:DC marker 的細分
教學表格若以 FCER1A、CST3 標示 DC,需補充:(1) CST3 為廣義 myeloid / 單核細胞與 DC 共表達基因,特異性低;(2) FCER1A 主要標示 cDC2(conventional DC type 2);(3) cDC1 常用 marker 為 CLEC9A、XCR1、BATF3;(4) pDC(漿細胞樣 DC)常用 marker 為 LILRA4、IL3RA (CD123)、CLEC4C;(5) AS-DC(Villani 2017 Science 描述的新亞群)以 AXL、SIGLEC6 標示。若要精細註釋 DC 子群,建議搭配 Azimuth PBMC reference 或 CellTypist immune model 使用。
If the tutorial uses FCER1A + CST3 to label DCs, refine: CST3 is shared across myeloid cells (low specificity); FCER1A mainly marks cDC2. cDC1 markers: CLEC9A, XCR1, BATF3. pDC markers: LILRA4, IL3RA (CD123), CLEC4C. AS-DC (Villani 2017): AXL, SIGLEC6. For fine DC sub-typing, use Azimuth PBMC reference or CellTypist immune model.
Sources: Domínguez Conde C et al. (2022) Cross-tissue immune cell analysis reveals tissue-specific features in humans, Science 376:eabl5197. DOI: 10.1126/science.abl5197; Villani AC et al. (2017) Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors, Science 356:eaah4573.
DE:跨條件比較的當前共識
教學若僅提到 FindMarkers(test.use="wilcox") 做跨條件比較,需更新:Squair et al. 2021 Nat Commun「Confronting false discoveries in single-cell differential expression」與 Murphy et al. 2022 Nat Commun「Limitations of cell-cell communication inference from single-cell RNA sequencing」皆指出:(1) 把 cluster × sample 做 pseudobulk 後使用 edgeR-LRT 或 DESeq2-LRT,比直接對單細胞做 Wilcoxon / MAST 在控制 FDR 上明顯較佳,後者會把同一樣本內細胞當獨立樣本而過度膨脹 type I error;(2) sample 數 (donor / replicate) 過少時 (n < 4 per group) pseudobulk 也不可靠,須謹慎;(3) 對 cell-type rare 或樣本內 zero-inflation 高的情境,可改用 MAST hurdle model + random effect 補充。實務:(a) 跨 donor / condition 的 DE 必須 pseudobulk;(b) 報告須明列 sample 數、pseudobulk strategy 與檢定方法。
If the tutorial only mentions FindMarkers(test.use="wilcox") for cross-condition DE, update: Squair et al. 2021 Nat Commun ("Confronting false discoveries in single-cell differential expression") and Murphy et al. 2022 Nat Commun show pseudobulk by cluster × sample followed by edgeR-LRT or DESeq2-LRT controls FDR far better than single-cell Wilcoxon / MAST, which treats cells within a sample as independent and inflates type I error. With very few samples (n < 4 per group) pseudobulk is also unreliable; consider MAST hurdle model with random effects. Always report sample count and pseudobulk strategy.
Sources: Squair JW et al. (2021) Confronting false discoveries in single-cell differential expression, Nat Commun 12:5692. DOI: 10.1038/s41467-021-25960-2; Murphy AE et al. (2022), Nat Commun 13:7980. DOI: 10.1038/s41467-022-35519-4.