aligner 該選哪一個?
| 工具 | 適用 | 優點 |
|---|---|---|
| bwa-mem2 | DNA short reads | 準確、社群最廣 |
| HISAT2 / STAR | RNA-seq | 支援 splice |
| minimap2 | 長讀 / 多模式 | 多模式預設 |
| salmon / kallisto | RNA-seq 量化 | 速度極快、不產生 BAM |
從 FASTQ 到 sorted+indexed BAM
# 1. 建 index(一次性)/ Build index (one-off) bwa-mem2 index reference/genome.fa # 2. align + 排序 + 建索引(pipe 一條龍) # Align + sort + index (one pipe) bwa-mem2 mem -t 8 -R "@RG\tID:sampleA\tSM:sampleA\tPL:ILLUMINA" \ reference/genome.fa \ raw_data/sampleA_R1.fastq.gz raw_data/sampleA_R2.fastq.gz \ | samtools sort -@ 4 -o results/bam/sampleA.sorted.bam - samtools index results/bam/sampleA.sorted.bam
# 1. 建 HISAT2 index(含 splice 資訊)/ Build with splice sites hisat2-build reference/genome.fa reference/hisat2_index # 2. RNA align hisat2 -p 8 -x reference/hisat2_index \ -1 raw_data/sampleA_R1.fastq.gz -2 raw_data/sampleA_R2.fastq.gz \ --rna-strandness RF \ --summary-file logs/hisat2_sampleA.log \ | samtools sort -@ 4 -o results/bam/sampleA.sorted.bam - samtools index results/bam/sampleA.sorted.bam
# Nanopore long reads # Use map-ont preset; -ax outputs SAM minimap2 -t 8 -ax map-ont reference/genome.fa nanopore.fastq.gz \ | samtools sort -@ 4 -o results/bam/nanopore.sorted.bam - samtools index results/bam/nanopore.sorted.bam # PacBio HiFi 用 -ax map-hifi;splice-aware 用 -ax splice # Splice-aware long-read alignment uses -ax splice
關鍵:把 aligner | samtools sort 用 pipe 串起來——避免產出中間 SAM 檔(純文字會很大)。samtools sort 的 - 表示從 stdin 讀入。
Key trick: pipe aligner | samtools sort directly — avoids the huge intermediate SAM. The - in samtools sort - means read from stdin.
samtools — BAM 操作主力
SAM FLAG — 用位元位記錄 alignment 屬性
FLAG 是個整數,每個 bit 代表一個屬性。常見的:
FLAG is an integer where each bit encodes a property. Common ones:
1 (0x1)paired
2 (0x2)proper pair
4 (0x4)unmapped
8 (0x8)mate unmapped
16 (0x10)reverse strand
256 (0x100)secondary
1024 (0x400)duplicate
2048 (0x800)supplementary
FLAG 解碼太麻煩?用 Broad 的 Explain Flags 頁面,或在 terminal 中輸入 samtools flags 99。
Decoding FLAGs is annoying — use the Broad Explain Flags page or run samtools flags 99 in the terminal.
互動:BAM 處理練習
📝 自我檢測
1. RNA-seq 為什麼不能用 bwa-mem 而要用 HISAT2 / STAR?
1. Why use HISAT2 / STAR for RNA-seq instead of bwa-mem?
2. 為什麼要把 aligner 與 samtools sort 用 pipe 串起來?
2. Why pipe the aligner directly into samtools sort?
3. samtools view sorted.bam chr1:10000-20000 跑出錯誤「random access is not supported」。原因?
3. samtools view sorted.bam chr1:10000-20000 fails with "random access is not supported". Why?
4. samtools view -c -F 4 a.bam 計算的是?
4. What does samtools view -c -F 4 a.bam count?