基本概念
一、ChIP-seq vs ATAC-seq
| — | ChIP-seq | ATAC-seq |
|---|---|---|
| 量什麼 | 特定蛋白與 DNA 結合位點 | 開放染色質區域 |
| 原理 | 抗體 IP → 純化結合 DNA → 定序 | Tn5 切開放區 → adapter → 定序 |
| 細胞需求 | 高 | 低 |
| 輸出 | BAM → narrowPeak / broadPeak (MACS2) | BAM → narrowPeak (MACS2) |
| 代表用途 | TF target、組蛋白標記 | 開放染色質、推估 TF 活性 |
R 上的角色:R 不跑 alignment 與 peak calling(這些用 Bowtie2、BWA、MACS2 等命令列工具在 Linux 跑)。R 接手的是:peak 註解、motif、差異結合、視覺化──這也是本章重點。
R's role: R does not do alignment or peak calling (use Bowtie2/BWA/MACS2 on Linux). R takes over for peak annotation, motif analysis, differential binding, visualization — that's this chapter.
ChIPseeker
二、ChIPseeker:peak 註解標準工具
# BiocManager::install(c('ChIPseeker','TxDb.Hsapiens.UCSC.hg38.knownGene','org.Hs.eg.db'))
# library(ChIPseeker); library(TxDb.Hsapiens.UCSC.hg38.knownGene)
# library(clusterProfiler); library(org.Hs.eg.db)
# txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
# 1) 讀 narrowPeak / BED 檔
# peaks <- readPeakFile('peaks/sample.narrowPeak')
# peaks # GRanges
# 2) 註解 peak:每個 peak 對應到最近的基因 + genomic feature
# anno <- annotatePeak(peaks,
# TxDb = txdb,
# tssRegion = c(-3000, 3000), # promoter 定義
# annoDb = 'org.Hs.eg.db',
# level = 'gene')
# anno_df <- as.data.frame(anno)
# head(anno_df[, c('seqnames','start','end','annotation','geneId','SYMBOL','distanceToTSS')])
# 3) 視覺化 / Visualize
# plotAnnoPie(anno) # promoter / intron / intergenic / ... 圓餅
# plotAnnoBar(anno)
# vennpie(anno)
# upsetplot(anno)
# plotDistToTSS(anno) # peak 距 TSS 距離分布
# 4) 多樣本比較 (e.g. 多個 ChIP)
# files <- list(s1 = 'peaks/s1.narrowPeak', s2 = 'peaks/s2.narrowPeak')
# annoList <- lapply(files, annotatePeak, TxDb = txdb, tssRegion = c(-3000, 3000))
# plotAnnoBar(annoList)
# 5) Peak heatmap:所有 peak 在 TSS 附近 ±3kb 的訊號分布
# promoter <- getPromoters(TxDb = txdb, upstream = 3000, downstream = 3000)
# tagMatrix <- getTagMatrix(peaks, windows = promoter)
# tagHeatmap(tagMatrix, xlim = c(-3000, 3000), color = 'firebrick')
# plotAvgProf(tagMatrix, xlim = c(-3000, 3000))
# 6) GO/KEGG 富集 (拿 anno 中的 geneId)
# entrez_ids <- unique(as.data.frame(anno)$geneId)
# ego <- enrichGO(entrez_ids, OrgDb = org.Hs.eg.db, ont = 'BP', readable = TRUE)
# dotplot(ego)
cat('ChIPseeker demo — install Bioc packages above to run on real peaks.\n')
DiffBind
三、DiffBind:差異結合分析
「兩組樣本的 ChIP/ATAC peak intensity 是否顯著不同?」DiffBind 把所有樣本的 peak 合併、計算 read counts、走 DESeq2/edgeR 框架做差異分析。
"Are peak intensities significantly different between two groups?" DiffBind merges peaks across samples, counts reads, then runs DESeq2/edgeR for differential testing.
# BiocManager::install('DiffBind')
# library(DiffBind)
# 1) 準備 sample sheet (CSV,必填欄:SampleID, Condition, bamReads, Peaks, PeakCaller)
# samples <- read.csv('samples.csv')
# head(samples)
# # SampleID Tissue Condition Replicate bamReads Peaks PeakCaller
# # ctrl_1 HeLa control 1 bam/ctrl_1.bam peaks/ctrl_1.narrow narrow
# 2) 建立 dba 物件
# dba <- dba(sampleSheet = samples)
# dba # 摘要
# 3) 計算每個 peak 的 read count
# dba <- dba.count(dba, summits = 250) # 以 peak summit ±250bp 為窗
# 4) 設定比較
# dba <- dba.contrast(dba, categories = DBA_CONDITION, minMembers = 2)
# 5) 執行差異分析(內部用 DESeq2)
# dba <- dba.analyze(dba)
# dba.show(dba, bContrasts = TRUE)
# 6) 取結果 / Extract results
# res <- dba.report(dba, contrast = 1, th = 0.05, fold = 1)
# res # GRanges of differentially bound peaks
# as.data.frame(res)
# 7) 視覺化
# dba.plotMA(dba) # MA plot
# dba.plotVolcano(dba) # Volcano
# dba.plotHeatmap(dba, contrast = 1, correlations = FALSE)
# dba.plotPCA(dba, attributes = DBA_CONDITION)
cat('DiffBind demo — needs BAM + peak files + sample sheet to actually run.\n')
Motif
四、Motif 分析
| 工具 | 用途 |
|---|---|
motifmatchr | PWM 匹配(chromVAR 用) |
JASPAR2024 | JASPAR motif 資料庫 |
TFBSTools | PWM 操作工具 |
memes | MEME R wrapper |
monaLisa | 差異 motif activity |
universalmotif | motif 格式互轉 |
# BiocManager::install(c('motifmatchr','JASPAR2024','TFBSTools',
# 'BSgenome.Hsapiens.UCSC.hg38'))
# library(motifmatchr); library(JASPAR2024); library(TFBSTools)
# library(BSgenome.Hsapiens.UCSC.hg38)
# 1) 從 JASPAR 拿 motif PWMs
# opts <- list(species = 9606, collection = 'CORE') # 9606 = human
# pwms <- getMatrixSet(JASPAR2024, opts)
# length(pwms)
# 2) 對 peaks 做 motif 匹配
# bs <- BSgenome.Hsapiens.UCSC.hg38
# motif_matches <- matchMotifs(pwms, peaks, genome = bs)
# motif_matches # SummarizedExperiment of TRUE/FALSE matches
# 3) 計算每個 peak 中匹配到的 motif 數量 + 對 TF activity (chromVAR)
# library(chromVAR)
# dev <- computeDeviations(object = se_atac, # ATAC counts SE
# annotations = motif_matches)
# variability <- computeVariability(dev)
# head(variability[order(-variability$variability), ])
cat('Motif demo — install motifmatchr + JASPAR2024 + chromVAR for full workflow.\n')
De novo motif discovery:R 中沒有原生工具──實務上會把 peaks 轉 FASTA 餵給 MEME / HOMER(命令列),結果再用
memes 套件讀回 R 視覺化。
De novo motif discovery: there's no native R tool — typically pipe peak FASTAs to MEME/HOMER (CLI) then read results back with memes.
基因組瀏覽
五、在 R 中畫 IGV-like 基因組軌道
# BiocManager::install(c('Gviz','rtracklayer'))
# library(Gviz); library(rtracklayer)
# library(TxDb.Hsapiens.UCSC.hg38.knownGene)
# # 區域定義 / Define region
# chr <- 'chr17'; start <- 7670000; end <- 7700000
# # 基因軌道 / Gene track
# txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
# gene_track <- GeneRegionTrack(txdb, chromosome = chr, start = start, end = end,
# name = 'Genes', transcriptAnnotation = 'symbol')
# # bigWig 訊號軌道 (e.g. ChIP-seq, ATAC-seq)
# bw1 <- DataTrack(range = 'data/sample1.bw', chromosome = chr,
# start = start, end = end, name = 'ChIP', type = 'h')
# bw2 <- DataTrack(range = 'data/sample2.bw', chromosome = chr,
# start = start, end = end, name = 'ATAC', type = 'h')
# # Peak 軌道
# peaks <- import('peaks/sample.narrowPeak')
# peak_track <- AnnotationTrack(peaks, chromosome = chr,
# start = start, end = end, name = 'Peaks')
# # 座標軸 + 染色體圖示
# axis <- GenomeAxisTrack()
# ideo <- IdeogramTrack(genome = 'hg38', chromosome = chr)
# # 一次畫所有軌道
# plotTracks(list(ideo, axis, gene_track, bw1, bw2, peak_track),
# from = start, to = end)
# # 進階替代:karyoploteR、ggbio
cat('Gviz demo — needs bigWig / peak files. Replace paths to actually run.\n')
scATAC-seq
六、scATAC-seq 工具:Signac / ArchR
| 工具 | 特色 |
|---|---|
| Signac (Satija lab) | Seurat 風格 API |
| ArchR | 效能極佳,大資料 |
| chromVAR | 細胞 TF 活性 |
| cisTopic | 主題模型 cis-module |
# 詳見 scRNA-seq tutorial 對應章節(參考 sister 教程)
# 標準流程概念:
# 1) Cell Ranger ATAC → fragments.tsv.gz + filtered_peak_bc_matrix
# 2) Signac::CreateChromatinAssay → 整合進 Seurat
# 3) QC (TSS enrichment, nucleosome signal)
# 4) LSI dimensionality reduction → UMAP → clustering
# 5) GeneActivity matrix → 與 scRNA 整合
# 6) Motif analysis (chromVAR) → cell-type TF activity
cat('scATAC demo — see Signac vignette for full single-cell pipeline.\n')
📝 自我檢測
1. 你拿到 MACS2 的 narrowPeak 檔,想知道每個 peak 最近的基因,最方便的 R 工具?
1. From MACS2 narrowPeak, easiest R tool for nearest-gene annotation?
2. 想做「兩組 ChIP-seq peak intensity 差異」最常用的 R 套件?
2. Most common R package for "differential ChIP-seq peak intensity"?
3. 關於 ATAC-seq vs ChIP-seq 何者正確?
3. Which statement about ATAC-seq vs ChIP-seq is correct?