scRNA-seq 參考資料

概覽

如何使用這份資料？

本頁針對教學中提到的每個工具、演算法與生物學概念，整理學術出處供讀者深入查閱。引用標籤含義：

For every tool, algorithm, and biological concept mentioned in the tutorial, this page collects academic sources for deeper reading. Citation tag meanings:

📄

Paper

原始論文 · 含 DOI / PubMed

Original papers with DOI / PubMed

📘

Doc

官方文件、vignette、tutorial

Official documentation, vignettes, tutorials

⭐

Best Practice

系統性綜述或 community 推薦

Systematic reviews and community recommendations

📊

Benchmark

方法評比 / 獨立 benchmarking

Method comparisons and independent benchmarks

🗄️

Database

Marker gene / 細胞註釋資料庫

Marker gene and cell annotation databases

📚

Book

線上免費書籍與綜合教材

Free online books and comprehensive textbooks

本頁目錄

★Best Practices 綜述 ★核心分析框架 Step 1品質管控 (QC) Step 2標準化 Step 3高變異基因 Step 4縮放 Step 5PCA Step 6聚類 Step 7UMAP / t-SNE Step 8細胞註釋 Step 9差異表達 Adv.資料整合 Adv.細胞通訊 Adv.軌跡分析 DBMarker 資料庫 !教學註記與細節

綜論

⭐ Best Practices 綜述論文

整體 scRNA-seq 分析流程的權威性教科書級綜述。建議從這兩篇開始閱讀，能對全流程的設計取捨建立全景觀。

Authoritative textbook-level reviews of the entire scRNA-seq analysis workflow. Start with these two papers to build a panoramic view of design tradeoffs across the full pipeline.

★ BEST Heumos L, Schaar AC, Lance C, et al. Best practices for single-cell analysis across modalities. Nature Reviews Genetics 24, 550–572 (2023). DOI: 10.1038/s41576-023-00586-w · 線上免費書 sc-best-practices.orgDOI: 10.1038/s41576-023-00586-w · Free online book sc-best-practices.org
★ BEST Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol 15, e8746 (2019). DOI: 10.15252/msb.20188746 · 教學原始碼 theislab/single-cell-tutorialDOI: 10.15252/msb.20188746 · Tutorial source code theislab/single-cell-tutorial
📚 BOOK Amezquita RA, Lun ATL, Becht E, et al. Orchestrating Single-Cell Analysis with Bioconductor (OSCA). Nat Methods 17, 137–145 (2020). DOI: 10.1038/s41592-019-0654-x · 線上免費書 bioconductor.org/books/OSCADOI: 10.1038/s41592-019-0654-x · Free online book bioconductor.org/books/OSCA
⭐ ATLAS Regev A, Teichmann SA, Lander ES, et al. The Human Cell Atlas. eLife 6:e27041 (2017). DOI: 10.7554/eLife.27041 · humancellatlas.orgDOI: 10.7554/eLife.27041 · humancellatlas.org

核心框架

🧰 核心分析框架

scRNA-seq 流程兩大主流軟體生態系。Seurat (R) 與 Scanpy (Python) 在功能上高度相容，多數方法兩端都有實作。

The two mainstream software ecosystems for scRNA-seq workflows. Seurat (R) and Scanpy (Python) are highly feature-compatible, with most methods implemented on both sides.

PAPER Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell 184(13):3573–3587.e29 (2021). (Seurat v4) DOI: 10.1016/j.cell.2021.04.048DOI: 10.1016/j.cell.2021.04.048
PAPER Hao Y, Stuart T, Kowalski MH, et al. Dictionary learning for integrative, multimodal and scalable single-cell analysis. Nat Biotechnol 42:293–304 (2024). (Seurat v5) DOI: 10.1038/s41587-023-01767-yDOI: 10.1038/s41587-023-01767-y
DOC Seurat 官方網站與 vignettes（Satija Lab）。 satijalab.org/seuratsatijalab.org/seurat
PAPER Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol 19:15 (2018). DOI: 10.1186/s13059-017-1382-0DOI: 10.1186/s13059-017-1382-0
DOC Scanpy 官方文件（scverse / Theis Lab）。 scanpy.readthedocs.io · GitHub scverse/scanpyscanpy.readthedocs.io · GitHub scverse/scanpy

Step 1

🧪 品質管控 (Quality Control)

細胞層級指標 (nFeature/nCount/percent.mt)、基因層級過濾，以及進階的 doublet 偵測與 ambient RNA 移除。

Cell-level metrics (nFeature/nCount/percent.mt), gene-level filtering, plus advanced doublet detection and ambient RNA removal.

Doublet 偵測

PAPER McGinnis CS, Murrow LM, Gartner ZJ. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Syst 8(4):329–337.e4 (2019). DOI: 10.1016/j.cels.2019.03.003 · GitHub chris-mcginnis-ucsf/DoubletFinderDOI: 10.1016/j.cels.2019.03.003 · GitHub chris-mcginnis-ucsf/DoubletFinder
PAPER Wolock SL, Lopez R, Klein AM. Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. Cell Syst 8(4):281–291.e9 (2019). DOI: 10.1016/j.cels.2018.11.005DOI: 10.1016/j.cels.2018.11.005
PAPER Germain PL, Lun A, Garcia Meixide C, Macnair W, Robinson MD. Doublet identification in single-cell sequencing data using scDblFinder. F1000Research 10:979 (2021). DOI: 10.12688/f1000research.73600.2 · Bioconductor scDblFinderDOI: 10.12688/f1000research.73600.2 · Bioconductor scDblFinder
BENCH Xi NM, Li JJ. Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data. Cell Syst 12(2):176–194.e6 (2021). DOI: 10.1016/j.cels.2020.11.008DOI: 10.1016/j.cels.2020.11.008

Ambient RNA / Soup

PAPER Young MD, Behjati S. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. GigaScience 9(12):giaa151 (2020). DOI: 10.1093/gigascience/giaa151DOI: 10.1093/gigascience/giaa151
PAPER Fleming SJ, Chaffin MD, Arduini A, et al. Unsupervised removal of systematic background noise from droplet-based single-cell experiments using CellBender. Nat Methods 20:1323–1335 (2023). DOI: 10.1038/s41592-023-01943-7 · GitHub broadinstitute/CellBenderDOI: 10.1038/s41592-023-01943-7 · GitHub broadinstitute/CellBender

自適應閾值與 QC 工具

PAPER McCarthy DJ, Campbell KR, Lun ATL, Wills QF. Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics 33(8):1179–1186 (2017). DOI: 10.1093/bioinformatics/btw777 · 含 isOutlier() MAD-based 自適應閾值DOI: 10.1093/bioinformatics/btw777 · Provides isOutlier() MAD-based adaptive thresholds
⭐ TIP Best practices 章節「Quality control」： sc-best-practices.org / quality controlsc-best-practices.org / quality control

Step 2

⚖️ 標準化 (Normalization)

PAPER · LogNormalize Stuart T, Butler A, Hoffman P, et al. Comprehensive Integration of Single-Cell Data. Cell 177(7):1888–1902.e21 (2019). (Seurat v3 引文) DOI: 10.1016/j.cell.2019.05.031DOI: 10.1016/j.cell.2019.05.031
PAPER · scran 池化解卷積 Lun ATL, Bach K, Marioni JC. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol 17:75 (2016). DOI: 10.1186/s13059-016-0947-7DOI: 10.1186/s13059-016-0947-7
PAPER · SCTransform Hafemeister C, Satija R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol 20:296 (2019). DOI: 10.1186/s13059-019-1874-1 · GitHub satijalab/sctransformDOI: 10.1186/s13059-019-1874-1 · GitHub satijalab/sctransform
PAPER · SCT v2 Choudhary S, Satija R. Comparison and evaluation of statistical error models for scRNA-seq. Genome Biol 23:27 (2022). DOI: 10.1186/s13059-021-02584-9DOI: 10.1186/s13059-021-02584-9
BENCH Ahlmann-Eltze C, Huber W. Comparison of transformations for single-cell RNA-seq data. Nat Methods 20:665–672 (2023). DOI: 10.1038/s41592-023-01814-1DOI: 10.1038/s41592-023-01814-1

Step 3

✨ 高變異基因選擇 (Variable Features)

DOC Seurat FindVariableFeatures() · vst / mvp / dispersion 三種方法。Seurat FindVariableFeatures() · vst / mvp / dispersion three methods. satijalab.org/seurat/reference/findvariablefeaturessatijalab.org/seurat/reference/findvariablefeatures
DOC Scanpy sc.pp.highly_variable_genes() · 提供 cell_ranger / seurat / seurat_v3 三種策略。Scanpy sc.pp.highly_variable_genes() · offers cell_ranger / seurat / seurat_v3 strategies. scanpy / highly_variable_genesscanpy / highly_variable_genes
PAPER · Brennecke (CV² 方法) Brennecke P, Anders S, Kim JK, et al. Accounting for technical noise in single-cell RNA-seq experiments. Nat Methods 10:1093–1095 (2013). DOI: 10.1038/nmeth.2645DOI: 10.1038/nmeth.2645
PAPER · scry / Pearson residual HVG Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 20:295 (2019). DOI: 10.1186/s13059-019-1861-6DOI: 10.1186/s13059-019-1861-6

Step 4

📏 縮放 (Scaling)

DOC Seurat ScaleData() · z-score、vars.to.regress 線性回歸殘差。Seurat ScaleData() · z-score and vars.to.regress linear regression residuals. satijalab.org/seurat/reference/scaledatasatijalab.org/seurat/reference/scaledata
DOC Scanpy sc.pp.scale() · 預設裁切 max_value，避免極端值主導。Scanpy sc.pp.scale() · clips at max_value by default to prevent extreme values from dominating. scanpy / scalescanpy / scale
PAPER · 細胞週期分數回歸 Tirosh I, Izar B, Prakadan SM, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science 352(6282):189–196 (2016). (Seurat CellCycleScoring 來源) DOI: 10.1126/science.aad0501DOI: 10.1126/science.aad0501

Step 5

📉 主成分分析 (PCA)

PAPER · IRLBA 加速 PCA Baglama J, Reichel L. Augmented implicitly restarted Lanczos bidiagonalization methods. SIAM J Sci Comput 27(1):19–42 (2005). (Seurat RunPCA 預設方法) DOI: 10.1137/04060593X · CRAN irlbaDOI: 10.1137/04060593X · CRAN irlba
DOC Seurat RunPCA() 與 ElbowPlot()、JackStraw() 文件。Seurat RunPCA() with ElbowPlot() and JackStraw() documentation. satijalab.org/seurat/articles/pbmc3k_tutorialsatijalab.org/seurat/articles/pbmc3k_tutorial
⭐ TIP Best practices 章節「Dimensionality reduction」對 nPCs 選擇有完整討論。 sc-best-practices / dimensionality reductionsc-best-practices / dimensionality reduction

Step 6

🧩 聚類 (Clustering)

PAPER · Louvain Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E. Fast unfolding of communities in large networks. J Stat Mech 2008(10):P10008. DOI: 10.1088/1742-5468/2008/10/P10008 · arXiv: 0803.0476DOI: 10.1088/1742-5468/2008/10/P10008 · arXiv: 0803.0476
PAPER · Leiden（推薦） Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: guaranteeing well-connected communities. Sci Rep 9:5233 (2019). DOI: 10.1038/s41598-019-41695-zDOI: 10.1038/s41598-019-41695-z
PAPER · SNN（PhenoGraph） Levine JH, Simonds EF, Bendall SC, et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell 162(1):184–197 (2015). (SNN 起源於 PhenoGraph) DOI: 10.1016/j.cell.2015.05.047DOI: 10.1016/j.cell.2015.05.047
BENCH Duò A, Robinson MD, Soneson C. A systematic performance evaluation of clustering methods for single-cell RNA-seq data. F1000Research 7:1141 (2018). DOI: 10.12688/f1000research.15666.3DOI: 10.12688/f1000research.15666.3

Step 7

🗺️ UMAP / t-SNE 視覺化

PAPER · UMAP McInnes L, Healy J, Melville J. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv:1802.03426 (2018). arxiv.org/abs/1802.03426 · GitHub lmcinnes/umaparxiv.org/abs/1802.03426 · GitHub lmcinnes/umap
PAPER · scRNA-seq 中的 UMAP Becht E, McInnes L, Healy J, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol 37:38–44 (2019). DOI: 10.1038/nbt.4314DOI: 10.1038/nbt.4314
PAPER · t-SNE van der Maaten L, Hinton G. Visualizing Data using t-SNE. J Mach Learn Res 9:2579–2605 (2008). jmlr.org/papers/v9/vandermaaten08a.htmljmlr.org/papers/v9/vandermaaten08a.html
PAPER · UMAP 不過度解讀的提醒 Chari T, Pachter L. The specious art of single-cell genomics. PLoS Comput Biol 19(8):e1011288 (2023). DOI: 10.1371/journal.pcbi.1011288 · 提醒勿過度解讀 2D embedding 距離DOI: 10.1371/journal.pcbi.1011288 · Cautions against over-interpreting 2D embedding distances

Step 8

🏷️ 細胞註釋 (Cell Annotation)

PAPER · SingleR Aran D, Looney AP, Liu L, et al. Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. Nat Immunol 20:163–172 (2019). DOI: 10.1038/s41590-018-0276-y · Bioconductor SingleRDOI: 10.1038/s41590-018-0276-y · Bioconductor SingleR
PAPER · CellTypist Domínguez Conde C, Xu C, Jarvis LB, et al. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376(6594):eabl5197 (2022). DOI: 10.1126/science.abl5197 · 官網 celltypist.orgDOI: 10.1126/science.abl5197 · Website celltypist.org
DOC · Azimuth Azimuth web app（reference-based mapping，使用 Seurat v4 中的方法）。引用基於 Hao et al. 2021 Cell。 azimuth.hubmapconsortium.org · GitHub satijalab/azimuthazimuth.hubmapconsortium.org · GitHub satijalab/azimuth
BENCH Abdelaal T, Michielsen L, Cats D, et al. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol 20:194 (2019). DOI: 10.1186/s13059-019-1795-zDOI: 10.1186/s13059-019-1795-z
⭐ TIP Best practices 章節「Annotation」流程與工具選擇。 sc-best-practices / annotationsc-best-practices / annotation

Step 9

🧾 差異表達分析 (Differential Expression)

PAPER · MAST hurdle model Finak G, McDavid A, Yajima M, et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol 16:278 (2015). DOI: 10.1186/s13059-015-0844-5DOI: 10.1186/s13059-015-0844-5
PAPER · DESeq2 Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15:550 (2014). DOI: 10.1186/s13059-014-0550-8DOI: 10.1186/s13059-014-0550-8
PAPER · edgeR Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140 (2010). DOI: 10.1093/bioinformatics/btp616DOI: 10.1093/bioinformatics/btp616
PAPER · pseudobulk 推薦做法 Squair JW, Gautier M, Kathe C, et al. Confronting false discoveries in single-cell differential expression. Nat Commun 12:5692 (2021). DOI: 10.1038/s41467-021-25960-2DOI: 10.1038/s41467-021-25960-2
PAPER · pseudo-replication 警示 Zimmerman KD, Espeland MA, Langefeld CD. A practical solution to pseudoreplication bias in single-cell studies. Nat Commun 12:738 (2021). DOI: 10.1038/s41467-021-21038-1DOI: 10.1038/s41467-021-21038-1
PAPER · muscat（multi-sample DE） Crowell HL, Soneson C, Germain PL, et al. muscat detects subpopulation-specific state transitions from multi-sample multi-condition single-cell transcriptomics data. Nat Commun 11:6077 (2020). DOI: 10.1038/s41467-020-19894-4DOI: 10.1038/s41467-020-19894-4

Advanced

🔗 資料整合 (Integration)

PAPER · Harmony Korsunsky I, Millard N, Fan J, et al. Fast, sensitive and accurate integration of single-cell data with Harmony. Nat Methods 16:1289–1296 (2019). DOI: 10.1038/s41592-019-0619-0 · GitHub immunogenomics/harmonyDOI: 10.1038/s41592-019-0619-0 · GitHub immunogenomics/harmony
PAPER · Seurat CCA / Anchors Stuart T, Butler A, Hoffman P, et al. Comprehensive Integration of Single-Cell Data. Cell 177(7):1888–1902.e21 (2019). DOI: 10.1016/j.cell.2019.05.031DOI: 10.1016/j.cell.2019.05.031
PAPER · scVI Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods 15:1053–1058 (2018). DOI: 10.1038/s41592-018-0229-2 · 框架 scvi-tools.orgDOI: 10.1038/s41592-018-0229-2 · Framework scvi-tools.org
PAPER · scanorama Hie B, Bryson B, Berger B. Efficient integration of heterogeneous single-cell transcriptomes using Scanorama. Nat Biotechnol 37:685–691 (2019). DOI: 10.1038/s41587-019-0113-3DOI: 10.1038/s41587-019-0113-3
BENCH · 整合方法評比 Luecken MD, Büttner M, Chaichoompu K, et al. Benchmarking atlas-level data integration in single-cell genomics. Nat Methods 19:41–50 (2022). DOI: 10.1038/s41592-021-01336-8 · 提供 LISI、kBET 等指標DOI: 10.1038/s41592-021-01336-8 · Introduces LISI, kBET, and other metrics
PAPER · scIB / scIB-metrics Tran HTN, Ang KS, Chevrier M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol 21:12 (2020). DOI: 10.1186/s13059-019-1850-9DOI: 10.1186/s13059-019-1850-9

Advanced

💬 細胞通訊 (Cell-Cell Communication)

PAPER · CellChat Jin S, Guerrero-Juarez CF, Zhang L, et al. Inference and analysis of cell-cell communication using CellChat. Nat Commun 12:1088 (2021). DOI: 10.1038/s41467-021-21246-9 · GitHub jinworks/CellChatDOI: 10.1038/s41467-021-21246-9 · GitHub jinworks/CellChat
PAPER · CellChat 跨條件分析（Protocol） Jin S, Plikus MV, Nie Q. CellChat for systematic analysis of cell–cell communication from single-cell transcriptomics. Nat Protoc 20:180–219 (2025). DOI: 10.1038/s41596-024-01045-4DOI: 10.1038/s41596-024-01045-4
PAPER · CellPhoneDB Efremova M, Vento-Tormo M, Teichmann SA, Vento-Tormo R. CellPhoneDB: inferring cell–cell communication from combined expression of multi-subunit ligand–receptor complexes. Nat Protoc 15:1484–1506 (2020). DOI: 10.1038/s41596-020-0292-x · GitHub ventolab/CellphoneDBDOI: 10.1038/s41596-020-0292-x · GitHub ventolab/CellphoneDB
PAPER · NicheNet Browaeys R, Saelens W, Saeys Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods 17:159–162 (2020). DOI: 10.1038/s41592-019-0667-5 · GitHub saeyslab/nichenetrDOI: 10.1038/s41592-019-0667-5 · GitHub saeyslab/nichenetr
PAPER · LIANA（共識框架） Dimitrov D, Türei D, Garrido-Rodriguez M, et al. Comparison of methods and resources for cell-cell communication inference from single-cell RNA-Seq data. Nat Commun 13:3224 (2022). DOI: 10.1038/s41467-022-30755-0 · 官網 saezlab.github.io/lianaDOI: 10.1038/s41467-022-30755-0 · Website saezlab.github.io/liana

Advanced

🛤️ 軌跡分析 (Trajectory / Pseudotime)

PAPER · Monocle3 Cao J, Spielmann M, Qiu X, et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566:496–502 (2019). DOI: 10.1038/s41586-019-0969-x · 官網 cole-trapnell-lab.github.io/monocle3DOI: 10.1038/s41586-019-0969-x · Website cole-trapnell-lab.github.io/monocle3
PAPER · Slingshot Street K, Risso D, Fletcher RB, et al. Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. BMC Genomics 19:477 (2018). DOI: 10.1186/s12864-018-4772-0DOI: 10.1186/s12864-018-4772-0
PAPER · RNA Velocity 起源 La Manno G, Soldatov R, Zeisel A, et al. RNA velocity of single cells. Nature 560:494–498 (2018). DOI: 10.1038/s41586-018-0414-6DOI: 10.1038/s41586-018-0414-6
PAPER · scVelo（dynamical model） Bergen V, Lange M, Peidli S, Wolf FA, Theis FJ. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat Biotechnol 38:1408–1414 (2020). DOI: 10.1038/s41587-020-0591-3 · 官網 scvelo.readthedocs.ioDOI: 10.1038/s41587-020-0591-3 · Website scvelo.readthedocs.io
PAPER · CytoTRACE Gulati GS, Sikandar SS, Wesche DJ, et al. Single-cell transcriptional diversity is a hallmark of developmental potential. Science 367(6476):405–411 (2020). DOI: 10.1126/science.aax0249 · 官網 cytotrace.stanford.eduDOI: 10.1126/science.aax0249 · Website cytotrace.stanford.edu
PAPER · PAGA Wolf FA, Hamey FK, Plass M, et al. PAGA: graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells. Genome Biol 20:59 (2019). DOI: 10.1186/s13059-019-1663-xDOI: 10.1186/s13059-019-1663-x
BENCH · 軌跡方法 41 種比較 Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods. Nat Biotechnol 37:547–554 (2019). DOI: 10.1038/s41587-019-0071-9DOI: 10.1038/s41587-019-0071-9

資料庫

🗄️ Marker Gene 資料庫與細胞圖譜

原教學「Annotation」章節示範的 PBMC marker（CD3D、IL7R、MS4A1、CD79A、GNLY、NKG7、KLRD1、CD14、LYZ、S100A8、FCGR3A、MS4A7、FCER1A、CST3、PPBP、PF4 等）在以下資料庫中可被追溯到原始文獻。

The PBMC markers demonstrated in the tutorial’s “Annotation” chapter (CD3D, IL7R, MS4A1, CD79A, GNLY, NKG7, KLRD1, CD14, LYZ, S100A8, FCGR3A, MS4A7, FCER1A, CST3, PPBP, PF4, etc.) can be traced back to their original literature in the databases below.

DB · PanglaoDB Franzén O, Gan LM, Björkegren JLM. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database (Oxford) baz046 (2019). DOI: 10.1093/database/baz046 · 站點 panglaodb.se · 6000+ marker–cell type 配對DOI: 10.1093/database/baz046 · Site panglaodb.se · 6000+ marker–cell type pairs
DB · CellMarker 2.0 Hu C, Li T, Xu Y, et al. CellMarker 2.0: an updated database of manually curated cell markers in human/mouse and web tools based on scRNA-seq data. Nucleic Acids Res 51(D1):D870–D876 (2023). DOI: 10.1093/nar/gkac947 · 站點 bio-bigdata.hrbmu.edu.cn/CellMarker · 83K+ tissue-cell-marker 條目DOI: 10.1093/nar/gkac947 · Site bio-bigdata.hrbmu.edu.cn/CellMarker · 83K+ tissue-cell-marker entries
DB · Tabula Sapiens Tabula Sapiens Consortium. The Tabula Sapiens: A multiple-organ, single-cell transcriptomic atlas of humans. Science 376(6594):eabl4896 (2022). DOI: 10.1126/science.abl4896 · 站點 tabula-sapiens.sf.czbiohub.org · 24 組織、500K+ 細胞、400+ 細胞類型DOI: 10.1126/science.abl4896 · Site tabula-sapiens.sf.czbiohub.org · 24 tissues, 500K+ cells, 400+ cell types
DB · Human Cell Atlas Human Cell Atlas Data Portal · 跨組織人類細胞圖譜（含 Tabula Sapiens、Lung Cell Atlas 等子計畫）。 data.humancellatlas.orgdata.humancellatlas.org
DB · CZ CELLxGENE Discover CZ CELLxGENE · Chan Zuckerberg Initiative 推出的單細胞探索與資料庫整合平台。 cellxgene.cziscience.comcellxgene.cziscience.com
DOC · 10x Genomics PBMC 標誌基因 10x Genomics PBMC datasets · Seurat / Scanpy PBMC tutorial 使用的標準訓練資料。 10xgenomics.com/datasets · Seurat PBMC3K：satijalab.org/seurat/articles/pbmc3k_tutorial10xgenomics.com/datasets · Seurat PBMC3K: satijalab.org/seurat/articles/pbmc3k_tutorial
DB · Azimuth References Azimuth · HuBMAP × Satija Lab 多器官 reference 集（PBMC、Kidney、Lung、Pancreas 等），可一鍵 mapping。 azimuth.hubmapconsortium.org/referencesazimuth.hubmapconsortium.org/references

📖

查找指南： 所有 DOI 連結指向出版社頁面。若需下載 PDF，可搭配 PubMed、 PMC、 arXiv、 bioRxiv 找開放近用版本。

教學註記

📌 教學註記與細節

下方為閱讀本 scRNA-seq 教學 HTML 與本 reference 比對後，發現的可能不完整、易誤解或可加強之處。不修改原教學檔案，僅在此說明以利參照。

Below are points discovered while cross-checking the tutorial HTML against this reference list — places that may be incomplete, easily misread, or worth expanding. The original tutorial files are not modified; clarifications are provided here for cross-reference.

補充說明 · Notes

Normalization：SCTransform 不一定最佳

教學若預設推薦 SCTransform (Hafemeister & Satija 2019) 為「最佳」normalization，需修正。Ahlmann-Eltze & Huber (2023, Nat Methods) 系統 benchmark 22 種 scRNA-seq normalization 方法，於下游 cluster 重現性、DE、軌跡分析比較，結論：(1) 簡單的 shifted log (log(y/s + 1)) 搭配 size-factor（如 scran::computeSumFactors）在多數任務表現等同或優於 SCTransform / GLM-PCA / scVI 等複雜方法；(2) Pearson residuals (SCT v2) 在 highly variable gene 篩選有優勢，但全矩陣輸出可能引入 spurious negative values；(3) 不存在「one-size-fits-all」normalization。實務：(a) 探索階段直接用 NormalizeData() shifted log 即可；(b) 若 sequencing depth 差異 >10× 才加 size-factor regression；(c) 跨方法交叉驗證 cluster stability。

Ahlmann-Eltze & Huber (2023, Nat Methods) benchmarked 22 normalization methods and found simple shifted log with size factors matches or beats SCTransform / GLM-PCA / scVI on cluster reproducibility, DE, and trajectory tasks. There is no universally best normalization; cross-validate cluster stability.

Sources: Ahlmann-Eltze C, Huber W (2023) Comparison of transformations for single-cell RNA-seq data, Nat Methods 20:665–672. DOI: 10.1038/s41592-023-01814-1; Hafemeister C, Satija R (2019) SCTransform, Genome Biology 20:296.

補充說明 · Notes

Clustering：k.param=20 的影響

教學常逕用 FindNeighbors(..., k.param=20) + FindClusters(resolution=0.5) 為預設，需注意此選擇 偏向中等 granularity：(1) k 過小（<10）會放大 noise，產生過多 micro-clusters；(2) k 過大（>50）會吞噬 rare cell types（如 plasmacytoid DC、Tregs）；(3) 小資料集（<1000 細胞）20 已接近全體，cluster 變得不穩；(4) 不同資料尺度應 sweep k 與 resolution，配合 silhouette、ROGUE 或 clustree 評估穩定性。實務：先以 clustree::clustree() 看 resolution 階梯，找到「stable plateau」；再以 scran::clusterCells() 或 bluster::clusterRows() 比較不同 k 的 cluster purity。

Default k.param=20 in Seurat is not neutral — too small inflates micro-clusters, too large merges rare populations. Sweep k and resolution, evaluate with clustree / silhouette / ROGUE; on small datasets k=20 can equal the cell count.

Sources: Hao Y et al. (2024) Seurat v5, Nat Biotechnol 42:293–304. DOI: 10.1038/s41587-023-01767-y; Zappia L, Oshlack A (2018) clustree, GigaScience 7:giy083; Liu et al. (2020) ROGUE, Nat Commun 11:3155.

補充說明 · Notes

Doublet：工具選擇與基準

教學若僅推介 DoubletFinder (McGinnis 2019)，需補充更新證據。Xi & Li (2021, Cell Systems) 系統 benchmark 9 種 doublet detection 工具於 16 個 ground-truth 資料集，結論：(1) scDblFinder (Germain 2021, F1000Research) 在準確度、執行速度、跨資料穩定性上總體最佳；(2) DoubletFinder 表現次之，但需手動估 expected rate；(3) Scrublet (Wolock 2019) 速度快但 recall 較低；(4) 不同工具偵到的 doublets 重疊有限，建議 ≥2 工具交集使用；(5) 任何工具均無法偵到 homotypic doublets（同型細胞融合），仍需以 nCount_RNA + nFeature_RNA 上限過濾。實務：scDblFinder 為首選，並注意 10x Multiome、Visium HD 等多模態資料須用模態專用 doublet 偵測（如 AMULET for ATAC）。

Xi & Li (2021, Cell Systems) benchmarked 9 doublet callers across 16 ground-truth datasets: scDblFinder ranked best overall, DoubletFinder second, Scrublet fastest but lower recall. Use consensus of ≥2 callers; no tool detects homotypic doublets — combine with nCount/nFeature filters.

Sources: Xi NM, Li JJ (2021) Benchmarking computational doublet-detection methods, Cell Systems 12:176–194. DOI: 10.1016/j.cels.2020.11.008; Germain PL et al. (2021) scDblFinder, F1000Research 10:979.

補充說明 · Notes

Pseudotime 的本質與誤用

教學常呈現 Monocle3 / Slingshot / PAGA / scVelo 的 trajectory 並暗示「細胞發育順序」，需釐清：(1) Pseudotime 為 reconstruction，不是 ground truth，只反映「樣本內表達相似性的拓撲順序」，不必對應實際時間軸；(2) Saelens et al. (2019, Nat Biotechnol) 系統 benchmark 45 種 trajectory inference 工具，發現不同方法在同一資料可推得迥異拓撲；(3) RNA velocity (La Manno 2018; Bergen 2020 scVelo) 雖加入動力學，但 Bergen et al. (2021, Mol Syst Biol) 指出 splicing kinetics 假設在許多 lineage 違反（如 erythroid maturation），可能反推錯誤方向；(4) 不同 root cell 選擇即決定 pseudotime 方向。實務：報告 trajectory 須 (a) 同時測 ≥2 工具確認 topology 一致、(b) 用 lineage tracing / scNT-seq / sci-fate 等實驗驗證、(c) 避免 over-interpret 「leaves」為終末分化態。

Pseudotime is a reconstruction, not a true time axis. Saelens et al. (2019, Nat Biotechnol) showed different tools give different topologies on the same data. RNA velocity assumptions (constant splicing/degradation rates) are violated in erythroid maturation (Bergen 2021, Mol Syst Biol). Validate with lineage tracing or metabolic-labeling sequencing.

Sources: Saelens W et al. (2019) A comparison of single-cell trajectory inference methods, Nat Biotechnol 37:547–554. DOI: 10.1038/s41587-019-0071-9; Bergen V et al. (2021) RNA velocity—current challenges, Mol Syst Biol 17:e10282.

補充說明 · Notes

Annotation：自動化與專家審查的平衡

教學若僅推介單一 annotation 工具，需平衡介紹三類策略：(1) Marker-based 手動：以 dotplot/violin 對照 PanglaoDB、CellMarker 2.0 (Hu 2023, NAR)、Azimuth references，優點可解讀、可控；缺點主觀且 rare cell type 易漏。(2) SingleR / scmap / Symphony（reference-based 相關係數）：快速、需 bulk 或 scRNA reference；對 reference 不涵蓋的 cell type 會強制錯誤分類。(3) scANVI / Celltypist (Domínguez Conde 2022, Science)（probabilistic deep model）：可輸出不確定性、支援 zero-shot 與 fine-tune；需 GPU。Tan & Cahan (2019, Cell Systems) 與 Abdelaal et al. (2019, Genome Biology) benchmark 顯示 沒有單一最佳工具，跨方法 majority vote 表現最穩。實務：(a) 任何自動 annotation 都應以 marker gene + 領域專家審查；(b) 報告中需列出 reference 版本、unmapped cell 比例、低 confidence 比例。

No single annotation tool dominates. Abdelaal et al. (2019, Genome Biology) and Tan & Cahan (2019, Cell Systems) recommend cross-method majority vote followed by marker review. SingleR is fast and reference-bound; scANVI/Celltypist give probabilistic outputs; manual marker review remains indispensable.

Sources: Abdelaal T et al. (2019) A comparison of automatic cell identification methods for single-cell RNA sequencing data, Genome Biology 20:194. DOI: 10.1186/s13059-019-1795-z; Domínguez Conde C et al. (2022) Celltypist, Science 376:eabl5197.

補充說明 · Notes

Integration：方法選擇與 over-correction

教學若預設 Harmony 或 Seurat CCA 為「萬用」integration，需澄清三大主流的取捨。Luecken et al. (2022, Nat Methods) scIB benchmark 16 種 integration 方法，於 13 個任務評估 batch removal vs biological conservation：(1) Harmony (Korsunsky 2019)：PCA-空間 fast soft-clustering，適合中等規模、組成差異小；過度校正風險高。(2) scVI / scANVI (Lopez 2018; Xu 2021)：VAE 模型，最佳「保留生物變異 + 校正 batch」綜合分；但需 GPU、訓練時間長、超參數敏感。(3) Seurat CCA / RPCA：基於 anchor，CCA 較 sensitive 但 over-correction 易；RPCA 為 v4+ 推薦折衷。(4) BBKNN：快速 KNN 校正，適合 cell-atlas 規模。實務：(a) 用 scIB metrics（kBET、iLISI、ARI）量化評估；(b) 警惕「批次差異 = 生物學差異」的情境（如疾病 vs 對照來自不同批次），integration 會抹除真實效應；(c) 報告須明列方法、版本、batch covariate。

Luecken et al. (2022, Nat Methods) scIB benchmark: scVI/scANVI top overall for biology conservation; Harmony fastest with over-correction risk; Seurat CCA sensitive but tends to over-correct (use RPCA). Quantify with kBET/iLISI/ARI. Watch for confounded batch × biology designs where integration removes real effects.

Sources: Luecken MD et al. (2022) Benchmarking atlas-level data integration in single-cell genomics, Nat Methods 19:41–50. DOI: 10.1038/s41592-021-01336-8; Korsunsky I et al. (2019) Harmony, Nat Methods 16:1289.

補充說明 · Notes

UMAP：誤把幾何當生物學

教學若以 UMAP cluster 間距推論「細胞型別相關性」，需強烈警告。Chari & Pachter (2023, PLOS Comput Biol) The specious art of single-cell genomics 系統論證：(1) UMAP / t-SNE 將高維點映射到 2D 必然 扭曲全域結構，cluster 間「距離」幾乎與真實高維距離無關；(2) 用同樣資料隨機 seed 可得到拓撲不同的 UMAP；(3) UMAP「分支」常被誤讀為發育軌跡，實為投影 artifact；(4) Pearson correlation between UMAP distance and true distance 常 <0.3。實務：(a) UMAP 僅作 local neighborhood 視覺化，不要用 cluster 間距論親緣；(b) 軌跡分析應在 PCA 或 diffusion map 空間進行；(c) 報告應 (i) 公開隨機 seed、(ii) 同時呈現 PCA pairs 圖、(iii) 不在 UMAP 上做統計推論。Kobak & Linderman (2021, Nat Biotechnol) 提出 PCA-initialized t-SNE/UMAP 可改善 reproducibility，但仍不解決全域距離問題。

Chari & Pachter (2023, PLOS Comput Biol) demonstrate UMAP/t-SNE inherently distort global structure; inter-cluster distances do not reflect biology and can change with random seed. Use UMAP only for local neighborhood viz; perform trajectory and statistics in PCA / diffusion-map space; always disclose seeds and complement with PCA pair plots.

Sources: Chari T, Pachter L (2023) The specious art of single-cell genomics, PLOS Comput Biol 19:e1011288. DOI: 10.1371/journal.pcbi.1011288; Kobak D, Linderman GC (2021) Initialization is critical for preserving global data structure in both t-SNE and UMAP, Nat Biotechnol 39:156.

Doublet detection：Xi & Li 2021 Cell Systems benchmark

教學在 QC 章節介紹 doublet 偵測時，常只列工具名稱（Scrublet, DoubletFinder, scDblFinder）而未給出選擇依據。Xi & Li 2021 Cell Systems「Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data」系統比較 9 種 doublet 偵測工具於 16 真實+模擬資料集：DoubletFinder 與 scDblFinder 在 AUPRC 兩端均居前列；Scrublet 速度最快但對 inter-cell-type doublet 召回較弱；intra-cell-type doublet（同類細胞合併）所有工具皆難偵測。實務原則：(1) 推薦 scDblFinder (Germain 2022 F1000Research) 作為預設，因其同時用 cluster-based 與 random doublet simulation；(2) 預期 doublet rate 應依 10x 公佈值（~0.4% per 1000 cells loaded）設置；(3) 偵測前後務必比較 nFeature / nCount 分布；(4) 跨樣本聯合 doublet 偵測可降低 batch 干擾。

When the QC chapter introduces doublet detection, it often lists tools (Scrublet, DoubletFinder, scDblFinder) without selection criteria. Xi & Li 2021 Cell Systems ('Benchmarking Computational Doublet-Detection Methods for Single-Cell RNA Sequencing Data') benchmarks 9 tools on 16 real+simulated datasets: DoubletFinder and scDblFinder lead AUPRC on both ends; Scrublet is fastest but weaker on inter-cell-type doublets; intra-cell-type doublets (same-type merges) are hard for all tools. Rules: (1) recommend scDblFinder (Germain 2022 F1000Research) as default — it combines cluster-based and random doublet simulation; (2) set expected doublet rate per 10x's published curve (~0.4% per 1,000 cells loaded); (3) always compare nFeature / nCount distributions before/after; (4) joint doublet detection across samples reduces batch interference.

Tran 2020 Genome Biology：batch correction 在無真 batch 時會傷害訊號

教學在 integration 章節常以「先 batch correct 再分析」作為標配，需補充警告：Tran, Ang, Chevrier, Zhang, Lee, Goh, Chen 2020 Genome Biology「A benchmark of batch-effect correction methods for single-cell RNA sequencing data」比較 14 種整合方法（Harmony, scVI, BBKNN, Seurat v3 CCA, MNN, Scanorama, Liger 等）於 5 個情境後指出：(1) 無真 batch（同實驗、同 platform）時做 over-correction，部分方法（如 BBKNN 預設參數）會把不同細胞類型「壓平」進同一 cluster；(2) Harmony 與 Seurat v3 在大多情境表現穩健但 hyperparameter 敏感；(3) scVI 在大 batch 效應強時優勢明顯但小批次易 overfit。實務原則：(1) 先以 PCA / UMAP on raw 資料評估 batch 效應強度；(2) 若 silhouette by batch < 0.05 且 cell types 已分群，可不做 batch correction；(3) Luecken 2022 Nat Methods 的 scIB-metrics 同時報 batch removal 與 bio conservation 雙軸；(4) 報告校正前後 marker gene 表現是否被壓抑。

When the integration chapter teaches 'batch-correct first, then analyse', add a caution: Tran, Ang, Chevrier, Zhang, Lee, Goh, Chen 2020 Genome Biology ('A benchmark of batch-effect correction methods for single-cell RNA sequencing data') compares 14 methods (Harmony, scVI, BBKNN, Seurat v3 CCA, MNN, Scanorama, Liger, etc.) across 5 scenarios and finds: (1) over-correction when no real batch exists (same experiment, same platform) — some methods (BBKNN at defaults) collapse distinct cell types into one cluster; (2) Harmony and Seurat v3 are robust across most scenarios but hyperparameter-sensitive; (3) scVI excels under strong batch effects but overfits on small batches. Rules: (1) first inspect batch strength on raw PCA / UMAP; (2) if silhouette-by-batch < 0.05 and cell types separate cleanly, skip correction; (3) Luecken 2022 Nat Methods's scIB-metrics reports batch removal and biology conservation on separate axes; (4) verify that marker-gene expression is not suppressed before/after correction.

補充說明 · Notes

QC：DoubletFinder API 名稱

教學中若仍示範 doubletFinder_v3()，需更新：此為 DoubletFinder < 2.0.4 的舊 API。自 2023 年發行的 DoubletFinder 2.0.4 起，函式已正名為 doubletFinder() 與 paramSweep()（不再帶 _v3 後綴）。新版本仍向後相容舊名稱，但建議讀者使用新 API；同時對應的參數（pN、pK、nExp）介面未變。

If the tutorial still demonstrates doubletFinder_v3(), update it: that is the pre-2.0.4 API. Since DoubletFinder 2.0.4 (2023) the functions are renamed doubletFinder() and paramSweep() (no _v3 suffix). Old names still work for backward compatibility, but new code should use the new API. Parameters (pN, pK, nExp) are unchanged.

Sources: GitHub: chris-mcginnis-ucsf/DoubletFinder README & release notes (v2.0.4, 2023).

補充說明 · Notes

Annotation：FCGR3A Mono 的常用名

教學表格若寫「FCGR3A Mono」，需補充：FCGR3A 即 CD16。在多數教科書、Seurat 官方 PBMC tutorial 與 Azimuth PBMC reference 中，此細胞群通常稱為 CD16+ Monocyte 或 非典型單核球 (non-classical monocyte)。三個名稱（FCGR3A Mono / CD16+ Mono / non-classical Mono）指向同一細胞群，文獻可互換；報告時建議使用 CD16+ Monocyte 以對齊免疫學社群慣例。

If the tutorial table lists "FCGR3A Mono", note that FCGR3A is CD16. Textbooks, the Seurat PBMC tutorial, and the Azimuth PBMC reference all use CD16+ Monocyte or non-classical monocyte for the same population. The three names are interchangeable; prefer CD16+ Monocyte in reports to align with immunology conventions.

Sources: Hao Y et al. (2021) Integrated analysis of multimodal single-cell data, Cell 184:3573 (Azimuth PBMC reference); Seurat PBMC3K vignette.

補充說明 · Notes

Annotation：DC marker 的細分

教學表格若以 FCER1A、CST3 標示 DC，需補充：(1) CST3 為廣義 myeloid / 單核細胞與 DC 共表達基因，特異性低；(2) FCER1A 主要標示 cDC2（conventional DC type 2）；(3) cDC1 常用 marker 為 CLEC9A、XCR1、BATF3；(4) pDC（漿細胞樣 DC）常用 marker 為 LILRA4、IL3RA (CD123)、CLEC4C；(5) AS-DC（Villani 2017 Science 描述的新亞群）以 AXL、SIGLEC6 標示。若要精細註釋 DC 子群，建議搭配 Azimuth PBMC reference 或 CellTypist immune model 使用。

If the tutorial uses FCER1A + CST3 to label DCs, refine: CST3 is shared across myeloid cells (low specificity); FCER1A mainly marks cDC2. cDC1 markers: CLEC9A, XCR1, BATF3. pDC markers: LILRA4, IL3RA (CD123), CLEC4C. AS-DC (Villani 2017): AXL, SIGLEC6. For fine DC sub-typing, use Azimuth PBMC reference or CellTypist immune model.

Sources: Domínguez Conde C et al. (2022) Cross-tissue immune cell analysis reveals tissue-specific features in humans, Science 376:eabl5197. DOI: 10.1126/science.abl5197; Villani AC et al. (2017) Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors, Science 356:eaah4573.

補充說明 · Notes

DE：跨條件比較的當前共識

教學若僅提到 FindMarkers(test.use="wilcox") 做跨條件比較，需更新：Squair et al. 2021 Nat Commun「Confronting false discoveries in single-cell differential expression」與 Murphy et al. 2022 Nat Commun「Limitations of cell-cell communication inference from single-cell RNA sequencing」皆指出：(1) 把 cluster × sample 做 pseudobulk 後使用 edgeR-LRT 或 DESeq2-LRT，比直接對單細胞做 Wilcoxon / MAST 在控制 FDR 上明顯較佳，後者會把同一樣本內細胞當獨立樣本而過度膨脹 type I error；(2) sample 數 (donor / replicate) 過少時 (n < 4 per group) pseudobulk 也不可靠，須謹慎；(3) 對 cell-type rare 或樣本內 zero-inflation 高的情境，可改用 MAST hurdle model + random effect 補充。實務：(a) 跨 donor / condition 的 DE 必須 pseudobulk；(b) 報告須明列 sample 數、pseudobulk strategy 與檢定方法。

If the tutorial only mentions FindMarkers(test.use="wilcox") for cross-condition DE, update: Squair et al. 2021 Nat Commun ("Confronting false discoveries in single-cell differential expression") and Murphy et al. 2022 Nat Commun show pseudobulk by cluster × sample followed by edgeR-LRT or DESeq2-LRT controls FDR far better than single-cell Wilcoxon / MAST, which treats cells within a sample as independent and inflates type I error. With very few samples (n < 4 per group) pseudobulk is also unreliable; consider MAST hurdle model with random effects. Always report sample count and pseudobulk strategy.

Sources: Squair JW et al. (2021) Confronting false discoveries in single-cell differential expression, Nat Commun 12:5692. DOI: 10.1038/s41467-021-25960-2; Murphy AE et al. (2022), Nat Commun 13:7980. DOI: 10.1038/s41467-022-35519-4.