Critical: BED chr1 100 200 corresponds to GTF chr1 101 200. Mixing coordinate systems is a top-3 source of bioinformatics bugs — every interval tool (bedtools etc.) assumes BED is 0-based!
常用工具
sort -k1,1 -k2,2n a.bed
標準 BED 排序
bedtools intersect -a a.bed -b b.bed
兩個 BED 交集
bedtools merge -i sorted.bed
合併重疊區段
SAM / BAM (alignment 結果)
SAM 是文字版、BAM 是壓縮 binary 版(同樣資訊)。每行一筆 alignment:QNAME、FLAG、RNAME、POS、MAPQ、CIGAR、RNEXT、PNEXT、TLEN、SEQ、QUAL、… 1-based。
SAM is text, BAM is its compressed binary form (same info). Each line is one alignment: QNAME, FLAG, RNAME, POS, MAPQ, CIGAR, RNEXT, PNEXT, TLEN, SEQ, QUAL, … (1-based).
Header lines start with ## / #CHROM. Data has 8 fixed columns (CHROM, POS, ID, REF, ALT, QUAL, FILTER, INFO) + per-sample GT. 1-based.
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT S1
chr1 10500 . A G 99 PASS DP=42;AF=0.5 GT:DP 0/1:42
chr1 10800 . T C 35 LowQ DP=12 GT:DP 1/1:12