R for Bioinformatics · References

OVERVIEW

How to use this page

針對教學中介紹的每個套件、演算法與概念，本頁列出原始論文與權威文件。標籤說明：

For every package, algorithm, and concept introduced in the tutorial we list the primary literature and authoritative documentation. Tag legend:

📄

Paper

同行評審論文—DOI / PubMed

Peer-reviewed publication — DOI / PubMed

📘

Doc

官方套件文件、vignette、教學

Official package documentation, vignette, tutorial

⭐

Best Practice

權威綜述或社群準則

Authoritative review or community guideline

📊

Benchmark

獨立方法比較

Independent method comparison

🗄️

Database

參考資料庫資源

Reference data resource

📚

Book

免費 / 經典教材

Free / classic textbook

📑

Cheatsheet

來自 Posit 的速查卡

Quick-reference card from Posit

Page contents

★Foundational books & reviews Step 1Setup & environment Step 2R language basics Step 3Data structures Step 4File I/O & paths Step 5tidyverse Step 6ggplot2 visualization Step 7Biostatistics Step 8Survival analysis Step 9Bioconductor core Step 10Bulk RNA-seq Step 11Functional enrichment Step 12Heatmap & volcano Step 13Variants & GWAS Step 14DNA methylation Step 15ChIP-seq & ATAC-seq Step 16Reproducibility ★Community & help !教學註記與細節

FOUNDATIONS

⭐ Foundational books & reviews

從這裡入門，建立 R、統計與生物資訊的紮實基礎。以下四本都可合法在線免費閱讀。

Start here for a solid base in R, statistics, and bioinformatics. All four are available legally online for free.

📚 BOOK Wickham H, Çetinkaya-Rundel M, Grolemund G. R for Data Science (2e). O'Reilly (2023). 免費線上閱讀： r4ds.hadley.nz —R + tidyverse 的權威入門書。Free online: r4ds.hadley.nz — definitive entry point to R + tidyverse.
📚 BOOK Wickham H. Advanced R (2e). CRC Press (2019). 免費線上閱讀： adv-r.hadley.nz —內部機制、environment、OOP 與效能。Free online: adv-r.hadley.nz — internals, environments, OOP, performance.
📚 BOOK Holmes S, Huber W. Modern Statistics for Modern Biology. Cambridge University Press (2019). 免費線上閱讀： huber.embl.de/msmb —以 R / Bioconductor 進行生物統計。Free online: huber.embl.de/msmb — biostatistics with R/Bioconductor.
📚 BOOK Buffalo V. Bioinformatics Data Skills. O'Reilly (2015). ISBN 978-1449367374 —基因體學所需的 Unix、R、Python 技能。ISBN 978-1449367374 — Unix, R, Python skills for genomics.
⭐ REVIEW Gentleman RC, Carey VJ, Bates DM, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biology 5, R80 (2004). DOI: 10.1186/gb-2004-5-10-r80 —Bioconductor 原始論文。DOI: 10.1186/gb-2004-5-10-r80 — original Bioconductor paper.
⭐ REVIEW Huber W, Carey VJ, Gentleman R, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nature Methods 12, 115–121 (2015). DOI: 10.1038/nmeth.3252DOI: 10.1038/nmeth.3252

STEP 1

⚙️ Setup & environment

R 本體、RStudio (Posit) IDE、專案慣例與 BiocManager 安裝器。

R itself, RStudio (Posit) IDE, project conventions, and the BiocManager installer.

DOC R Project: download & install (CRAN). cran.r-project.org —Windows / macOS / Linux 的官方執行檔。cran.r-project.org — official binaries for Windows / macOS / Linux.
DOC Posit (formerly RStudio): RStudio Desktop free download. posit.co/download/rstudio-desktopposit.co/download/rstudio-desktop
DOC R Installation and Administration manual. CRAN R-admin.htmlCRAN R-admin.html
DOC An Introduction to R (W.N. Venables, D.M. Smith and the R Core Team). CRAN R-intro.html —官方教學。CRAN R-intro.html — official tutorial.
DOC BiocManager — installing & managing Bioconductor packages. bioconductor.org/install · CRAN page cran/BiocManagerbioconductor.org/install · CRAN page cran/BiocManager
DOC Rtools — Windows toolchain for compiling source packages. cran/bin/windows/Rtoolscran/bin/windows/Rtools
⭐ TIPS Bryan J. What They Forgot to Teach You About R. online book (2022). rstats.wtf —專案、路徑、.Rprofile 與除錯。rstats.wtf — projects, paths, .Rprofile, debugging.
PAPER Wilson G, Bryan J, Cranston K, et al. Good Enough Practices in Scientific Computing. PLOS Comput Biol 13(6):e1005510 (2017). DOI: 10.1371/journal.pcbi.1005510DOI: 10.1371/journal.pcbi.1005510

STEP 2

🔤 R language basics

DOC The R Language Definition (R Core Team). CRAN R-lang.html —正式語言參考。CRAN R-lang.html — formal language reference.
📚 BOOK Wickham H. Advanced R — chapters on vectors, subsetting, control flow, functions. 免費線上閱讀： adv-r.hadley.nz/vectors-chapFree online: adv-r.hadley.nz/vectors-chap
⭐ STYLE Wickham H. The tidyverse style guide. style.tidyverse.org —命名、空格與慣用語。style.tidyverse.org — naming, spacing, idioms.
⭐ STYLE Bååth R. Google's R Style Guide (formerly internal). google.github.io/styleguide/Rguidegoogle.github.io/styleguide/Rguide
📑 CHEAT Base R cheatsheet (Mhairi McNeill). PDF 來自 posit.co/resources/cheatsheetsPDF via posit.co/resources/cheatsheets

STEP 3

🧱 Data structures

📚 BOOK Wickham H. Advanced R — Vectors, Lists, Subsetting, S3 / S4 / R6. 免費線上閱讀： adv-r.hadley.nzFree online: adv-r.hadley.nz
DOC tibble 套件文件。tibble package documentation. tibble.tidyverse.orgtibble.tidyverse.org
PAPER Chambers JM. Object-Oriented Programming, Functional Programming and R. Statistical Science 29(2):167–180 (2014). DOI: 10.1214/13-STS452 —S3 / S4 設計理念。DOI: 10.1214/13-STS452 — S3 / S4 design rationale.
DOC methods 套件—正式類別與方法 (S4)。methods package — formal classes & methods (S4). stat.ethz.ch/R-manual/methodsstat.ethz.ch/R-manual/methods
DOC data.table 入門 vignette。data.table introduction vignette. CRAN data.table introCRAN data.table intro

STEP 4

📂 File I/O & path management

DOC readr—快速友善的檔案 I/O（tidyverse）。readr — fast and friendly file I/O (tidyverse). readr.tidyverse.orgreadr.tidyverse.org
DOC readxl、writexl—Excel I/O。readxl, writexl — Excel I/O. readxl.tidyverse.org · writexlreadxl.tidyverse.org · writexl
DOC data.table::fread—最快的 CSV 讀取器。data.table::fread — fastest CSV reader. r-datatable.comr-datatable.com
DOC arrow—R 上的 Apache Arrow / Parquet / Feather。arrow — Apache Arrow / Parquet / Feather in R. arrow.apache.org/docs/rarrow.apache.org/docs/r
DOC here—以專案根目錄為基準的穩健路徑。here — robust paths via project root. here.r-lib.org · 作者 Jenny Bryan 的部落格文章 jennybc/here_herehere.r-lib.org · author Jenny Bryan's blog post jennybc/here_here
DOC fst—超快隨機存取大型資料框。fst — ultra-fast random access of large data frames. fstpackage.orgfstpackage.org
PAPER Fischer B, Smith ML, Pau G. rhdf5 — HDF5 interface for R. Bioconductor (2010+). Vignette： bioconductor.org/packages/rhdf5Vignette: bioconductor.org/packages/rhdf5

STEP 5

🔧 tidyverse

PAPER Wickham H, Averick M, Bryan J, et al. Welcome to the tidyverse. J Open Source Softw 4(43):1686 (2019). DOI: 10.21105/joss.01686DOI: 10.21105/joss.01686
PAPER Wickham H. Tidy Data. J Stat Softw 59(10) (2014). DOI: 10.18637/jss.v059.i10DOI: 10.18637/jss.v059.i10
DOC dplyr—資料操作文法。dplyr — A Grammar of Data Manipulation. dplyr.tidyverse.orgdplyr.tidyverse.org
DOC tidyr—轉置與 tidy data。tidyr — pivoting & tidy data. tidyr.tidyverse.orgtidyr.tidyverse.org
DOC purrr—針對 list / vector 的函式式程式設計。purrr — functional programming over lists/vectors. purrr.tidyverse.orgpurrr.tidyverse.org
DOC magrittr—pipe 運算子（%>%）。magrittr — pipe operator (%>%). magrittr.tidyverse.org · R 4.1+ native pipe RFC CRAN pipeOpmagrittr.tidyverse.org · R 4.1+ native pipe RFC CRAN pipeOp
📑 CHEAT dplyr / tidyr / purrr cheatsheets (Posit). posit.co/resources/cheatsheetsposit.co/resources/cheatsheets

STEP 6

🎨 ggplot2 visualization

PAPER Wickham H. ggplot2: elegant graphics for data analysis. Springer-Verlag (2016, 3rd ed in progress). 免費線上閱讀（第 3 版）： ggplot2-book.orgFree online (3e): ggplot2-book.org
PAPER Wilkinson L. The Grammar of Graphics (2nd ed). Springer (2005). ISBN 978-0387245447 —ggplot2 背後的理論基礎。ISBN 978-0387245447 — the theoretical foundation behind ggplot2.
DOC ggplot2 參考網站。ggplot2 reference site. ggplot2.tidyverse.orgggplot2.tidyverse.org
DOC patchwork—以 + / / 組合多個 ggplot。patchwork — compose multiple ggplots with + / /. patchwork.data-imaginist.compatchwork.data-imaginist.com
DOC ggrepel—自動避免重疊的文字標籤。ggrepel — automatic non-overlapping text labels. ggrepel.slowkow.comggrepel.slowkow.com
DOC ggpubr、cowplot、ggsci、hrbrthemes—出版級主題與輔助函式。ggpubr, cowplot, ggsci, hrbrthemes — publication themes & helpers. ggpubr · cowplot · ggsci · hrbrthemesggpubr · cowplot · ggsci · hrbrthemes
⭐ COLOR Garnier S, Ross N, Rudis B, et al. viridis — color-blind-safe perceptually uniform palettes. sjmgarnier.github.io/viridissjmgarnier.github.io/viridis
⭐ COLOR Brewer CA, Harrower M. ColorBrewer — palette design for cartography & data viz. colorbrewer2.orgcolorbrewer2.org

STEP 7

📐 Biostatistics

📚 BOOK Holmes S, Huber W. Modern Statistics for Modern Biology. 免費線上閱讀： huber.embl.de/msmbFree online: huber.embl.de/msmb
📚 BOOK Wood SN. Generalized Additive Models: An Introduction with R (2nd ed). CRC Press (2017).
PAPER Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. JRSS B 57(1):289–300 (1995). DOI: 10.1111/j.2517-6161.1995.tb02031.x —FDR / BH 原始論文。DOI: 10.1111/j.2517-6161.1995.tb02031.x — the FDR / BH paper.
PAPER Storey JD, Tibshirani R. Statistical significance for genome-wide studies. PNAS 100(16):9440–9445 (2003). DOI: 10.1073/pnas.1530509100 —q-value 起源論文。DOI: 10.1073/pnas.1530509100 — q-value origin.
DOC stats—base R 統計：t.test()、aov()、lm()、glm()、p.adjust()。stats — base R statistics: t.test(), aov(), lm(), glm(), p.adjust(). stat.ethz.ch/R-manual/statsstat.ethz.ch/R-manual/stats
DOC broom—整理模型輸出（lm、glm、cox…）。broom — tidy model output (lm, glm, cox, ...). broom.tidymodels.orgbroom.tidymodels.org
DOC qvalue—Storey q-value（Bioconductor）。qvalue — Storey q-values (Bioconductor). bioconductor.org/packages/qvaluebioconductor.org/packages/qvalue

STEP 8

⏳ Survival analysis

PAPER Kaplan EL, Meier P. Nonparametric Estimation from Incomplete Observations. JASA 53(282):457–481 (1958). DOI: 10.1080/01621459.1958.10501452 —Kaplan–Meier 原始論文。DOI: 10.1080/01621459.1958.10501452 — original KM paper.
PAPER Cox DR. Regression Models and Life-Tables. JRSS B 34(2):187–202 (1972). DOI: 10.1111/j.2517-6161.1972.tb00899.x —Cox PH 原始論文。DOI: 10.1111/j.2517-6161.1972.tb00899.x — original Cox PH paper.
📚 BOOK Therneau TM, Grambsch PM. Modeling Survival Data: Extending the Cox Model. Springer (2000). ISBN 978-0-387-98784-2.
DOC survival R 套件（Therneau）。survival R package (Therneau). CRAN survivalCRAN survival
DOC survminer—KM 圖與森林圖。survminer — KM plots, forest plots. rpkgs.datanovia.com/survminerrpkgs.datanovia.com/survminer
⭐ TUT Zabor EC. Survival Analysis in R (online tutorial). emilyzabor.comemilyzabor.com

STEP 9

🧬 Bioconductor core

PAPER Morgan M, Obenchain V, Hester J, Pagès H. SummarizedExperiment: a container for matrix-like assays. Bioconductor package (2016+). Vignette： bioconductor.org/packages/SummarizedExperimentVignette: bioconductor.org/packages/SummarizedExperiment
PAPER Lawrence M, Huber W, Pagès H, et al. Software for Computing and Annotating Genomic Ranges. PLOS Comput Biol 9(8):e1003118 (2013). DOI: 10.1371/journal.pcbi.1003118 —GenomicRanges 原始論文。DOI: 10.1371/journal.pcbi.1003118 — the GenomicRanges paper.
PAPER Pagès H, Aboyoun P, Gentleman R, DebRoy S. Biostrings: Efficient manipulation of biological strings. Bioconductor package. bioconductor.org/packages/Biostringsbioconductor.org/packages/Biostrings
DOC AnnotationDbi 與 org.*.eg.db—基因 ID 對應架構。AnnotationDbi & org.*.eg.db — gene ID mapping framework. AnnotationDbi · org.Hs.eg.dbAnnotationDbi · org.Hs.eg.db
DOC GenomicFeatures / TxDb.* 與 EnsDb.*—轉錄本資料庫。GenomicFeatures / TxDb.* & EnsDb.* — transcript databases. GenomicFeatures · ensembldbGenomicFeatures · ensembldb
DOC rtracklayer—匯入 / 匯出 GTF / BED / bigWig。rtracklayer — import/export GTF / BED / bigWig. bioconductor.org/packages/rtracklayerbioconductor.org/packages/rtracklayer
DOC AnnotationHub 與 ExperimentHub—雲端註解 / 實驗資料。AnnotationHub & ExperimentHub — cloud annotation/experiment data. AnnotationHub · ExperimentHubAnnotationHub · ExperimentHub
📚 BOOK Lawrence M, Hahne F (eds). Bioconductor Course Materials. bioconductor.org/help/course-materialsbioconductor.org/help/course-materials

STEP 10

🧪 Bulk RNA-seq

PAPER Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biology 15:550 (2014). DOI: 10.1186/s13059-014-0550-8 — vignette bioconductor.org/packages/DESeq2DOI: 10.1186/s13059-014-0550-8 — vignette bioconductor.org/packages/DESeq2
PAPER Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139–140 (2010). DOI: 10.1093/bioinformatics/btp616DOI: 10.1093/bioinformatics/btp616
PAPER Ritchie ME, Phipson B, Wu D, et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 43(7):e47 (2015). DOI: 10.1093/nar/gkv007DOI: 10.1093/nar/gkv007
PAPER Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15:R29 (2014). DOI: 10.1186/gb-2014-15-2-r29DOI: 10.1186/gb-2014-15-2-r29
PAPER Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research 4:1521 (2015). DOI: 10.12688/f1000research.7563.2 —tximport 設計理念。DOI: 10.12688/f1000research.7563.2 — tximport rationale.
PAPER Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 14:417–419 (2017). DOI: 10.1038/nmeth.4197DOI: 10.1038/nmeth.4197
📊 BENCH Schurch NJ, Schofield P, Gierliński M, et al. How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA 22(6):839–851 (2016). DOI: 10.1261/rna.053959.115DOI: 10.1261/rna.053959.115
DOC RNA-seq workflow: gene-level exploratory analysis and differential expression (Bioconductor). bioconductor.org/rnaseqGenebioconductor.org/rnaseqGene

STEP 11

🔬 Functional enrichment

PAPER Wu T, Hu E, Xu S, et al. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation 2(3):100141 (2021). DOI: 10.1016/j.xinn.2021.100141DOI: 10.1016/j.xinn.2021.100141
PAPER Yu G, Wang LG, Han Y, He QY. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16(5):284–287 (2012). DOI: 10.1089/omi.2011.0118DOI: 10.1089/omi.2011.0118
PAPER Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102(43):15545–15550 (2005). DOI: 10.1073/pnas.0506580102 —GSEA 原始論文。DOI: 10.1073/pnas.0506580102 — original GSEA paper.
PAPER Korotkevich G, Sukhov V, Budin N, et al. Fast gene set enrichment analysis (fgsea). bioRxiv 060012 (2021). DOI: 10.1101/060012 —clusterProfiler 使用的 fgsea 引擎。DOI: 10.1101/060012 — fgsea engine used by clusterProfiler.
🗄️ DB Liberzon A, Birger C, Thorvaldsdóttir H, et al. The Molecular Signatures Database (MSigDB) hallmark gene set collection. Cell Syst 1(6):417–425 (2015). DOI: 10.1016/j.cels.2015.12.004 · gsea-msigdb.orgDOI: 10.1016/j.cels.2015.12.004 · gsea-msigdb.org
🗄️ DB Ashburner M, Ball CA, Blake JA, et al. Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29 (2000). DOI: 10.1038/75556 · geneontology.orgDOI: 10.1038/75556 · geneontology.org
🗄️ DB Kanehisa M, Goto S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res 28(1):27–30 (2000). kegg.jpkegg.jp
🗄️ DB Gillespie M, Jassal B, Stephan R, et al. The Reactome pathway knowledgebase. Nucleic Acids Res 50(D1):D687–D692 (2022). DOI: 10.1093/nar/gkab1028 · reactome.orgDOI: 10.1093/nar/gkab1028 · reactome.org
DOC msigdbr—以 tidy R 格式提供 MSigDB 基因集。msigdbr — MSigDB gene sets in tidy R format. igordot.github.io/msigdbrigordot.github.io/msigdbr

STEP 12

🌋 Heatmap & volcano

PAPER Gu Z, Eils R, Schlesner M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32(18):2847–2849 (2016). DOI: 10.1093/bioinformatics/btw313DOI: 10.1093/bioinformatics/btw313
📚 BOOK Gu Z. ComplexHeatmap Complete Reference. 免費線上閱讀： jokergoo.github.io/ComplexHeatmap-referenceFree online: jokergoo.github.io/ComplexHeatmap-reference
DOC pheatmap—一行產生漂亮熱圖。pheatmap — pretty heatmaps with one call. CRAN pheatmapCRAN pheatmap
DOC EnhancedVolcano—出版級火山圖。EnhancedVolcano — publication-ready volcano plots. bioconductor.org/packages/EnhancedVolcanobioconductor.org/packages/EnhancedVolcano
DOC circlize—環狀視覺化（弦圖、Circos 圖）。circlize — circular visualization (chord diagrams, Circos plots). jokergoo.github.io/circlize_bookjokergoo.github.io/circlize_book
⭐ DESIGN Wilke CO. Fundamentals of Data Visualization. O'Reilly (2019). 免費線上閱讀： clauswilke.com/datavizFree online: clauswilke.com/dataviz

STEP 13

🧩 Variants & GWAS

PAPER Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics 27(15):2156–2158 (2011). DOI: 10.1093/bioinformatics/btr330 —VCF 規範。DOI: 10.1093/bioinformatics/btr330 — VCF spec.
PAPER Knaus BJ, Grünwald NJ. vcfR: a package to manipulate and visualize variant call format data in R. Mol Ecol Resour 17(1):44–53 (2017). DOI: 10.1111/1755-0998.12549DOI: 10.1111/1755-0998.12549
DOC VariantAnnotation—Bioconductor VCF S4 架構。VariantAnnotation — Bioconductor VCF S4 framework. bioconductor.org/packages/VariantAnnotationbioconductor.org/packages/VariantAnnotation
PAPER McLaren W, Gil L, Hunt SE, et al. The Ensembl Variant Effect Predictor. Genome Biology 17:122 (2016). DOI: 10.1186/s13059-016-0974-4DOI: 10.1186/s13059-016-0974-4
PAPER Purcell S, Neale B, Todd-Brown K, et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575 (2007). DOI: 10.1086/519795 · PLINK 2: cog-genomics.org/plink2DOI: 10.1086/519795 · PLINK 2: cog-genomics.org/plink2
PAPER Zheng X, Levine D, Shen J, et al. A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28(24):3326–3328 (2012). DOI: 10.1093/bioinformatics/bts606 —SNPRelate 論文。DOI: 10.1093/bioinformatics/bts606 — SNPRelate.
⭐ REVIEW Uffelmann E, Huang QQ, Munung NS, et al. Genome-wide association studies. Nat Rev Methods Primers 1:59 (2021). DOI: 10.1038/s43586-021-00056-9DOI: 10.1038/s43586-021-00056-9
⭐ TUT Marees AT, de Kluiver H, Stringer S, et al. A tutorial on conducting genome-wide association studies: Quality control and statistical analysis. Int J Methods Psychiatr Res 27:e1608 (2018). DOI: 10.1002/mpr.1608DOI: 10.1002/mpr.1608
DOC qqman—R 上的曼哈頓圖與 Q-Q 圖。qqman — Manhattan & Q-Q plots in R. CRAN qqmanCRAN qqman
🗄️ DB Sollis E, Mosaku A, Abid A, et al. The NHGRI-EBI GWAS Catalog. Nucleic Acids Res 51(D1):D977–D985 (2023). ebi.ac.uk/gwasebi.ac.uk/gwas

STEP 14

🔵 DNA methylation

PAPER Aryee MJ, Jaffe AE, Corrada-Bravo H, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30(10):1363–1369 (2014). DOI: 10.1093/bioinformatics/btu049DOI: 10.1093/bioinformatics/btu049
PAPER Tian Y, Morris TJ, Webster AP, et al. ChAMP: updated methylation analysis pipeline for Illumina BeadChips. Bioinformatics 33(24):3982–3984 (2017). DOI: 10.1093/bioinformatics/btx513DOI: 10.1093/bioinformatics/btx513
PAPER Peters TJ, Buckley MJ, Statham AL, et al. De novo identification of differentially methylated regions in the human genome. Epigenetics & Chromatin 8:6 (2015). DOI: 10.1186/1756-8935-8-6 —DMRcate 套件。DOI: 10.1186/1756-8935-8-6 — DMRcate.
PAPER Du P, Zhang X, Huang CC, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11:587 (2010). DOI: 10.1186/1471-2105-11-587 —β vs M 值比較。DOI: 10.1186/1471-2105-11-587 — β vs M values.
PAPER Akalin A, Kormaksson M, Li S, et al. methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biology 13:R87 (2012). DOI: 10.1186/gb-2012-13-10-r87DOI: 10.1186/gb-2012-13-10-r87
PAPER Hansen KD, Langmead B, Irizarry RA. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biology 13:R83 (2012). DOI: 10.1186/gb-2012-13-10-r83 —bsseq 套件。DOI: 10.1186/gb-2012-13-10-r83 — bsseq.
PAPER Horvath S. DNA methylation age of human tissues and cell types. Genome Biology 14:R115 (2013). DOI: 10.1186/gb-2013-14-10-r115 —表觀遺傳時鐘原始論文。DOI: 10.1186/gb-2013-14-10-r115 — original epigenetic clock.
PAPER Pelegí-Sisó D, de Prado P, Ronkainen J, et al. methylclock: a Bioconductor package to estimate DNA methylation age. Bioinformatics 37(12):1759–1760 (2021). DOI: 10.1093/bioinformatics/btaa825DOI: 10.1093/bioinformatics/btaa825
⭐ TUT Maksimovic J, Phipson B, Oshlack A. A cross-package Bioconductor workflow for analysing methylation array data. F1000Research 5:1281 (2016). DOI: 10.12688/f1000research.8839.3DOI: 10.12688/f1000research.8839.3

STEP 15

📍 ChIP-seq & ATAC-seq

PAPER Yu G, Wang LG, He QY. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Bioinformatics 31(14):2382–2383 (2015). DOI: 10.1093/bioinformatics/btv145DOI: 10.1093/bioinformatics/btv145
PAPER Stark R, Brown G. DiffBind: differential binding analysis of ChIP-seq peak data. Bioconductor / Cancer Research UK Cambridge Institute (2011). bioconductor.org/packages/DiffBindbioconductor.org/packages/DiffBind
PAPER Zhang Y, Liu T, Meyer CA, et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biology 9:R137 (2008). DOI: 10.1186/gb-2008-9-9-r137 —MACS / MACS2。DOI: 10.1186/gb-2008-9-9-r137 — MACS / MACS2.
PAPER Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10:1213–1218 (2013). DOI: 10.1038/nmeth.2688 —ATAC-seq 原始論文。DOI: 10.1038/nmeth.2688 — original ATAC-seq paper.
PAPER Schep AN, Wu B, Buenrostro JD, Greenleaf WJ. chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data. Nat Methods 14:975–978 (2017). DOI: 10.1038/nmeth.4401DOI: 10.1038/nmeth.4401
🗄️ DB Rauluseviciute I, Riudavets-Puig R, Blanc-Mathieu R, et al. JASPAR 2024: 20th anniversary of the open-access database of transcription factor binding profiles. Nucleic Acids Res 52(D1):D174–D182 (2024). DOI: 10.1093/nar/gkad1059DOI: 10.1093/nar/gkad1059
PAPER Stuart T, Srivastava A, Madad S, Lareau CA, Satija R. Single-cell chromatin state analysis with Signac. Nat Methods 18:1333–1341 (2021). DOI: 10.1038/s41592-021-01282-5DOI: 10.1038/s41592-021-01282-5
PAPER Granja JM, Corces MR, Pierce SE, et al. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat Genet 53:403–411 (2021). DOI: 10.1038/s41588-021-00790-6DOI: 10.1038/s41588-021-00790-6
⭐ REVIEW Yan F, Powell DR, Curtis DJ, Wong NC. From reads to insight: a hitchhiker's guide to ATAC-seq data analysis. Genome Biology 21:22 (2020). DOI: 10.1186/s13059-020-1929-3DOI: 10.1186/s13059-020-1929-3
⭐ STD ENCODE Project Consortium. ENCODE ChIP-seq / ATAC-seq pipeline standards. encodeproject.org/atac-seqencodeproject.org/atac-seq

STEP 16

📦 Reproducibility & workflow

DOC renv—可重現的專案級 R 環境。renv — reproducible per-project R environments. rstudio.github.io/renvrstudio.github.io/renv
DOC RStudio Projects. support.posit.co RStudio Projectssupport.posit.co RStudio Projects
📚 BOOK Xie Y, Allaire JJ, Grolemund G. R Markdown: The Definitive Guide. CRC Press (2018). 免費線上閱讀： bookdown.org/yihui/rmarkdownFree online: bookdown.org/yihui/rmarkdown
DOC Quarto — next-generation literate programming (Posit). quarto.orgquarto.org
📚 BOOK Bryan J, the STAT 545 TAs. Happy Git and GitHub for the useR. 免費線上閱讀： happygitwithr.comFree online: happygitwithr.com
DOC targets—R 上的 Make 式管線。targets — Make-like pipelines for R. docs.ropensci.org/targetsdocs.ropensci.org/targets
DOC testthat—單元測試架構。testthat — unit testing framework. testthat.r-lib.orgtestthat.r-lib.org
DOC Rocker — Docker images for R / RStudio / Bioconductor. rocker-project.orgrocker-project.org
PAPER Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten Simple Rules for Reproducible Computational Research. PLOS Comput Biol 9(10):e1003285 (2013). DOI: 10.1371/journal.pcbi.1003285DOI: 10.1371/journal.pcbi.1003285
PAPER Boettiger C. An introduction to Docker for reproducible research. SIGOPS Oper. Syst. Rev. 49(1):71–79 (2015). DOI: 10.1145/2723872.2723882DOI: 10.1145/2723872.2723882

COMMUNITY

💬 Community & help

DOC Bioconductor support forum — package authors typically reply within 24h. support.bioconductor.orgsupport.bioconductor.org
DOC Stack Overflow—[r] 標籤。Stack Overflow — [r] tag. stackoverflow.com/questions/tagged/rstackoverflow.com/questions/tagged/r
DOC Posit Community forum. forum.posit.coforum.posit.co
DOC Bioconductor Slack workspace (community.bioconductor.org). community-bioc.slack.comcommunity-bioc.slack.com
DOC R-Ladies global / chapters — peer-led R community. rladies.orgrladies.org
DOC Bioinformatics Stack Exchange. bioinformatics.stackexchange.combioinformatics.stackexchange.com
DOC browseVignettes("<package>") — built-in package tutorials, accessible offline. Run inside R; opens HTML vignettes for any installed package.Run inside R; opens HTML vignettes for any installed package.

💡

Citation guidance: when you publish work using these packages, please cite the primary references above in addition to the package itself. citation("packageName") in R prints the recommended BibTeX entry.

📖

查找指南： 所有 DOI 連結指向出版社頁面。若需下載 PDF，可搭配 PubMed、 PMC、 arXiv、 bioRxiv 找開放近用版本。

教學註記

📌 教學註記與細節

以下整理 R / Bioconductor 工具與工作流中容易踩坑的細節、版本相依性與情境限制；部分屬於勘誤、多數屬於補充說明。

Notes on R / Bioconductor tool nuances, version dependencies, and context-dependent caveats; some are errata, most are supplementary notes.

補充說明 · Notes

DESeq2、edgeR 與 limma-voom 的選擇

三者皆為 bulk RNA-seq 差異表達主流工具，但底層假設不同：DESeq2 與 edgeR 採負二項分佈（NB-GLM），limma-voom 則先以 voom 將計數轉為連續變量並估計 mean-variance 權重後跑線性模型。Schurch 等人（2016, RNA）的 48 重複實驗指出三者在 n < 12 時表現相當，DESeq2 在小樣本下較保守、edgeR 較敏感；當 n > 12 時 limma-voom 速度顯著占優且結果與 NB 方法高度一致。複雜實驗設計（block / random effects）建議 limma 系列；快速重複跑大量對比則 edgeR 的 quasi-likelihood (glmQLFit) 為主流推薦。請勿將 TPM / FPKM 餵入這三者 — 它們都需要原始整數計數。

All three are mainstream bulk RNA-seq DE tools but rest on different assumptions: DESeq2 and edgeR use a negative-binomial GLM, while limma-voom transforms counts into continuous values via voom mean-variance weights and runs a linear model. Schurch et al. 2016 (RNA), based on 48-replicate experiments, found all three comparable when n < 12, with DESeq2 more conservative and edgeR more sensitive in small samples; for n > 12, limma-voom is dramatically faster while producing results highly concordant with NB methods. For complex designs (blocks / random effects) the limma family is recommended; for fast iteration over many contrasts, edgeR's quasi-likelihood pipeline (glmQLFit) is the current standard. Never feed TPM / FPKM into any of the three — they all require raw integer counts.

來源：Schurch et al. 2016 (RNA) · Law et al. 2016 (F1000 RNA-seq workflow)

Source: Schurch et al. 2016 (RNA) · Law et al. 2016 (F1000 RNA-seq workflow)

補充說明 · Notes

Median-of-ratios 才適合做差異表達；CPM / TPM 不是

DESeq2 的 median-of-ratios（Anders & Huber 2010）與 edgeR 的 TMM（Robinson & Oshlack 2010）估計 size factor / normalization factor，目的是修正樣本間 library size 與組成偏差，僅在 DE 模型內部使用、不會改動原始計數。TPM 與 FPKM 則在「每個樣本內」對基因長度做正規化，使其總和為 10⁶ — 適合視覺化、跨基因比較；但跨樣本不可比，且違反 DESeq2 / edgeR 對整數計數的假設。常見錯誤：把 TPM 矩陣餵入 DESeq2 後再四捨五入為整數 — 此舉會徹底破壞 dispersion 估計。請保留原始整數計數做 DE，僅在繪圖與 heatmap 時使用 vst() 或 rlog() 轉換。

DESeq2's median-of-ratios (Anders & Huber 2010) and edgeR's TMM (Robinson & Oshlack 2010) estimate size factors / normalization factors to correct between-sample library-size and compositional bias; they are used internally by the DE model and never alter the raw counts. TPM and FPKM normalize within a sample by gene length so each sample sums to 10⁶ — suitable for visualization and cross-gene comparison, but not comparable across samples, and violating the integer-count assumption of DESeq2 / edgeR. A common mistake is feeding a TPM matrix into DESeq2 after rounding to integers, which completely breaks dispersion estimation. Keep raw integer counts for DE and use vst() or rlog() only for plotting and heatmaps.

來源：Anders & Huber 2010 (Genome Biol) · Robinson & Oshlack 2010 (TMM)

Source: Anders & Huber 2010 (Genome Biol) · Robinson & Oshlack 2010 (TMM)

補充說明 · Notes

BiocManager 必須鎖定 Bioc 版本：用 BiocManager::valid()

Bioconductor 每半年發新版（4 月 / 10 月），且每個 Bioc release 嚴格綁定一個 R minor version（例如 Bioc 3.19 ⇄ R 4.4）。同時混用不同 release 的套件會出現 namespace 失誤或載入失敗。建議啟動 R 後先跑 BiocManager::valid() 檢查目前裝載的套件是否來自同一 Bioc release；若回傳 not TRUE，請依提示修補。安裝指定版本用 BiocManager::install(version = "3.19")；組織需求請考慮搭配 renv 或 Docker (BioContainers / Bioconductor docker images) 凍結整套版本。

Bioconductor releases twice a year (April / October), and each release is strictly tied to one R minor version (e.g. Bioc 3.19 ⇄ R 4.4). Mixing packages from different releases causes namespace failures and load errors. After starting R, run BiocManager::valid() to verify that installed packages come from the same Bioc release; if the result is not TRUE, follow the prompts to repair. Install a specific release with BiocManager::install(version = "3.19"); for team reproducibility, pair this with renv or the official Bioconductor / BioContainers Docker images to pin the entire stack.

來源：Bioconductor Install Guide · Huber et al. 2015 (Nat Methods)

Source: Bioconductor Install Guide · Huber et al. 2015 (Nat Methods)

補充說明 · Notes

S4 DataFrame 不等於 base data.frame

Bioconductor 的 S4Vectors::DataFrame（大寫 D、F）並非 base R 的 data.frame。它可放入任意 S4 物件（包含 List、Rle、GRanges）作為欄位，並支援 metadata() slot；SummarizedExperiment、SingleCellExperiment 的 colData / rowData 預設皆為 DataFrame。常見錯誤：把 DataFrame 用 as.data.frame() 強轉後再寫回，會遺失 S4 metadata 並把 Rle / List 欄位展平為字串。正確的列篩選用 df[df$type == "gene", ]（保留 DataFrame），需要做 ggplot2 等 base API 時才轉成 tibble / data.frame，並注意 ggplot2 4.x 才正式支援 S4 columns。

Bioconductor's S4Vectors::DataFrame (capital D, F) is not base R's data.frame. It can hold arbitrary S4 objects (List, Rle, GRanges, etc.) as columns and supports a metadata() slot; the colData and rowData of SummarizedExperiment / SingleCellExperiment are DataFrames by default. A common mistake is coercing a DataFrame via as.data.frame() and writing it back, which drops S4 metadata and flattens Rle / List columns to strings. Use df[df$type == "gene", ] for row subsetting (preserves DataFrame) and only convert to tibble / data.frame when handing off to base APIs like ggplot2 — note that ggplot2 4.x is the first version to formally support S4 columns.

來源：S4Vectors Bioc page · Lawrence et al. 2013 (PLoS Comp Bio, GenomicRanges)

Source: S4Vectors Bioc page · Lawrence et al. 2013 (PLoS Comp Bio, GenomicRanges)

補充說明 · Notes

ggplot2 中 factor 等級順序決定圖例 / 顏色映射順序

當變數作為 fill / color aesthetic 餵入 ggplot2 時，圖例與顏色配對的順序取決於該欄位的 factor levels 而非資料出現順序。若以字串型態送入，ggplot2 會自動按字典序排序（例如 "Control" 排在 "Treatment" 之前看似合理，但 "10mg" 會排在 "2mg" 之前）。請先用 forcats::fct_relevel() 或 factor(x, levels = c(...)) 明確指定順序，並注意 scale_fill_manual(values = c("Control" = "grey", ...)) 的 named vector 寫法可避免依賴 level 順序。stacked bar 由下而上堆疊也遵循同一 level 順序。

When a variable is mapped to ggplot2's fill / color aesthetic, the legend and color assignment follow the column's factor levels, not the data's row order. Strings get lexically sorted automatically — "Control" before "Treatment" looks reasonable, but "10mg" sorts before "2mg". Set the order explicitly with forcats::fct_relevel() or factor(x, levels = c(...)); the named-vector form scale_fill_manual(values = c("Control" = "grey", ...)) avoids reliance on level order. Stacked bars also stack from bottom up in the same level order.

來源：ggplot2 book · Colour scales · forcats docs

Source: ggplot2 book · Colour scales · forcats docs

補充說明 · Notes

色盲友善色票：viridis / Okabe-Ito 而非 rainbow / jet

約 8% 的男性與 0.5% 的女性有紅綠色覺異常，傳統 rainbow() / matplotlib jet 在轉灰階後完全失去順序資訊。連續型量值（log2FC、表達量、p-value）請改用 scale_fill_viridis_c()（含 viridis / magma / cividis 等選項，皆為色盲友善且亮度單調）。類別變數 ≤ 8 類則建議 Okabe & Ito 八色配色（含黑、橘、藍、綠、黃、紫、紅、粉），R 內可由 palette.colors("Okabe-Ito") 取得。檢驗工具：colorBlindness::cvdPlot() 或 colorBrewer 線上模擬器。

About 8% of men and 0.5% of women have red-green color-vision deficiency; the classic rainbow() / matplotlib jet palettes lose all ordinal information when converted to greyscale. For continuous values (log2FC, expression, p-values) use scale_fill_viridis_c() (viridis / magma / cividis are all color-blind safe and monotonic in luminance). For categorical variables with ≤ 8 levels, use the Okabe & Ito 8-color palette (black, orange, blue, green, yellow, purple, red, pink), available via palette.colors("Okabe-Ito"). Test palettes with colorBlindness::cvdPlot() or the online ColorBrewer simulator.

來源：Okabe & Ito · Color Universal Design · viridis CRAN page

Source: Okabe & Ito · Color Universal Design · viridis CRAN page

補充說明 · Notes

tibble 與 data.frame 的細微差異

tibble 修正了 base data.frame 的三個常見坑：(1) 不會自動把字串轉成 factor（base R 4.0 起 stringsAsFactors 預設改為 FALSE，但 4.0 之前的腳本與部分 read.* 仍會踩到）；(2) 單欄子集 tbl[, 1] 永遠回傳 tibble，base 的 df[, 1] 預設 drop = TRUE 會回傳 vector，造成下游 nrow()、矩陣運算錯誤；(3) tibble 不允許 partial matching tbl$nam，base data.frame 則會悄悄當成 $name。Bioconductor 函式多預期 base data.frame 或 DataFrame，將 tibble 直接餵入 SummarizedExperiment 等建構子前需以 as.data.frame() 轉換並用 row.names = ... 補回 rownames。

tibble fixes three classic data.frame quirks: (1) it does not coerce strings to factors automatically (base R 4.0+ defaults stringsAsFactors to FALSE, but earlier scripts and some read.* functions still bite); (2) single-column subsetting tbl[, 1] always returns a tibble, whereas base df[, 1] defaults to drop = TRUE and returns a vector, breaking downstream nrow() or matrix operations; (3) tibble disallows partial matching tbl$nam, while base data.frame silently treats it as $name. Many Bioconductor functions expect base data.frame or DataFrame — convert a tibble with as.data.frame() and restore rownames via row.names = ... before feeding it to SummarizedExperiment-style constructors.

來源：tibble · invariants vignette · R4DS · Data import

Source: tibble · invariants vignette · R4DS · Data import

補充說明 · Notes

dplyr 的 _if / _at / *_all 已被 across() 取代

自 dplyr 1.0（2020 年 6 月）起，mutate_if()、summarise_at()、mutate_all() 等 scoped verbs 全部標為 superseded（不棄用但不再開發新功能），改以單一 across() 函式涵蓋三者：mutate(across(where(is.numeric), scale))、summarise(across(c(weight, height), mean, na.rm = TRUE))。新教學與 R4DS 第 2 版皆已改用 across；舊腳本仍可運作但建議改寫以提升可讀性與一致性。Lifecycle 標籤可由 ?mutate_if 看到。

Since dplyr 1.0 (June 2020), the scoped verbs mutate_if(), summarise_at(), mutate_all(), and friends are marked superseded (not deprecated, but no further development); a single across() covers all three: mutate(across(where(is.numeric), scale)), summarise(across(c(weight, height), mean, na.rm = TRUE)). R4DS 2e and current tutorials use across exclusively; old scripts still run but should be migrated for readability and consistency. The lifecycle tag is visible on the help page via ?mutate_if.

來源：dplyr · Column-wise operations · dplyr 1.0 release notes

Source: dplyr · Column-wise operations · dplyr 1.0 release notes

補充說明 · Notes

renv::snapshot() 的 implicit 與 explicit 模式

renv 預設使用 type = "implicit"：只記錄目前程式碼（R 檔、Rmd、Qmd）實際被 library() / require() 載入的套件。優點是 lockfile 精簡；缺點是只在腳本中以 pkg::fun() 取用、或經由 DESCRIPTION 隱式被載入的套件不會被記錄，部署時容易少裝。若是 R package 開發或 Quarto book，建議改為 renv::snapshot(type = "explicit") 並維護 DESCRIPTION 的 Depends / Imports；想完全凍結整個 library 的研究專案則使用 type = "all"。每次 commit 前跑 renv::status() 檢查 lockfile 是否與安裝同步。

renv defaults to type = "implicit": only packages actually loaded via library() / require() in the current code (R, Rmd, Qmd files) are recorded. The lockfile stays small, but packages used as pkg::fun() or pulled in implicitly via DESCRIPTION may be missed and silently absent at deploy time. For R-package development or Quarto books, prefer renv::snapshot(type = "explicit") while maintaining DESCRIPTION's Depends / Imports; for research projects that want to freeze the entire library, use type = "all". Run renv::status() before each commit to verify the lockfile matches the installed library.

來源：renv official vignette · ?renv::snapshot

Source: renv official vignette · ?renv::snapshot

補充說明 · Notes

sessionInfo() 不夠，請改用 sessioninfo::session_info()

base R 的 sessionInfo() 只列出 attached packages 與 namespace、不顯示套件來源（CRAN / Bioconductor / GitHub）、commit SHA 與 R / Bioconductor 版本對齊資訊；對重現性追蹤不足。sessioninfo::session_info()（亦由 devtools / pkgdown 使用）會額外列出每個套件的 source（如 Bioconductor 3.19 或 Github (user/repo@sha)）、Date、Loaded via namespace 並更易閱讀。投稿論文 / 補件給審稿人時建議附 session_info() 而非 sessionInfo()；長期專案也可考慮 renv::lockfile() 搭配 sessioninfo 雙軌記錄。

Base R's sessionInfo() lists attached packages and namespaces but does not report package sources (CRAN / Bioconductor / GitHub), commit SHAs, or R / Bioconductor alignment information — insufficient for reproducibility tracking. sessioninfo::session_info() (also used by devtools / pkgdown) additionally reports the source of each package (e.g. Bioconductor 3.19 or Github (user/repo@sha)), the install date, and namespace loads, with much cleaner formatting. When submitting papers or responding to reviewers, attach session_info() rather than sessionInfo(); for long-running projects, pair it with renv::lockfile() for dual tracking.

來源：sessioninfo package site · Sandve et al. 2013 (10 rules reproducible research)

Source: sessioninfo package site · Sandve et al. 2013 (10 rules reproducible research)

How to use this page

Paper

Doc

Best Practice

Benchmark

Database

Book

Cheatsheet

Page contents

⭐ Foundational books & reviews

⚙️ Setup & environment

🔤 R language basics

🧱 Data structures

📂 File I/O & path management

🔧 tidyverse

🎨 ggplot2 visualization

📐 Biostatistics

⏳ Survival analysis

🧬 Bioconductor core

🧪 Bulk RNA-seq

🔬 Functional enrichment

🌋 Heatmap & volcano

🧩 Variants & GWAS

🔵 DNA methylation

📍 ChIP-seq & ATAC-seq

📦 Reproducibility & workflow

💬 Community & help

📌 教學註記與細節

DESeq2、edgeR 與 limma-voom 的選擇

Median-of-ratios 才適合做差異表達；CPM / TPM 不是

BiocManager 必須鎖定 Bioc 版本：用 BiocManager::valid()

S4 DataFrame 不等於 base data.frame

ggplot2 中 factor 等級順序決定圖例 / 顏色映射順序

色盲友善色票：viridis / Okabe-Ito 而非 rainbow / jet

tibble 與 data.frame 的細微差異

dplyr 的 *_if / *_at / *_all 已被 across() 取代

renv::snapshot() 的 implicit 與 explicit 模式

sessionInfo() 不夠，請改用 sessioninfo::session_info()

dplyr 的 _if / _at / *_all 已被 across() 取代