從命令列到可重現分析流程 — 14 個章節循序漸進,每一個指令都對應到真實的生物資訊情境,搭配互動終端機、模擬練習與自我測驗。
From the command line to reproducible analysis pipelines — 14 progressive chapters where every command maps to a real bioinformatics scenario, paired with an interactive terminal, simulations and self-checks.
先學會在 Linux 中找路、看檔、改名、建立專案結構,建立真正能整理 NGS 資料的能力。
Learn to navigate, view, rename and build a project skeleton — the foundation for organising NGS data on Linux.
理解 Linux 在 NGS、HPC、可重現分析中的角色,以及課程學習地圖。
Understand Linux's role in NGS, HPC, reproducible analysis — and the course roadmap.
pwd、cd、ls、絕對/相對路徑,從 0 開始用 NGS 專案結構練習。
pwd, cd, ls, absolute vs relative paths — practiced on a real NGS project tree.
cp、mv、rm、mkdir、chmod、ln — 安全管理 raw_data 與 reference。
cp, mv, rm, mkdir, chmod, ln — safely manage raw_data and reference files.
cat、less、head、tail、wc、pipe、redirect — 不開檔就能讀懂大型 FASTQ。
cat, less, head, tail, wc, pipes, redirects — read huge FASTQs without opening them.
把 Linux 變成生物資訊資料的瑞士刀:grep / cut / awk / sed 處理 GTF、BED、樣本表,並學會用 conda 管理可重現的軟體環境。
Turn Linux into a bioinformatics Swiss-army knife: grep / cut / awk / sed on GTF, BED, sample sheets — plus conda for reproducible environments.
grep、cut、sort、uniq、paste、join — 從 GTF 抽出 gene、做 QC summary 表。
grep, cut, sort, uniq, paste, join — extract genes from GTF, build QC summary tables.
欄位運算、條件判斷、批次取代 — 處理 GTF/BED/sample sheet 的進階武器。
Field arithmetic, conditionals, batch substitution — power tools for GTF/BED/sample sheets.
wget、curl、rsync、gzip、tar、md5sum — 下載公開資料並驗證 FASTQ 完整性。
wget, curl, rsync, gzip, tar, md5sum — fetch public data and verify FASTQ integrity.
conda / mamba、Bioconda、environment.yml — 每個專案一個環境,可重現性的起點。
conda / mamba, Bioconda, environment.yml — one env per project, the basis of reproducibility.
把 Linux 技能落到 NGS 真正的工作流:FASTQ/FASTA/GTF/BED/SAM/BAM/VCF、FastQC、MultiQC、samtools、bedtools。
Apply Linux to real NGS work: FASTQ/FASTA/GTF/BED/SAM/BAM/VCF, FastQC, MultiQC, samtools, bedtools.
FASTQ / FASTA / GTF / GFF / BED / SAM / BAM / VCF — 認識欄位、檢查、抽樣。
FASTQ / FASTA / GTF / GFF / BED / SAM / BAM / VCF — fields, sanity checks, sampling.
批次跑 FastQC、整合 MultiQC 報告、解讀常見品質指標。
Batch FastQC, aggregate via MultiQC, and read common quality metrics.
bwa / hisat2 / minimap2、samtools sort/index、flag 統計、BAM 整理。
bwa / hisat2 / minimap2, samtools sort/index, flag stats, BAM tidying.
從一次性指令成長為可重現 pipeline:Bash script → Snakemake/Nextflow → Slurm + Container。
Grow from one-off commands to reproducible pipelines: Bash script → Snakemake/Nextflow → Slurm + Container.
變數、loop、條件判斷、set -euo pipefail、log — 把指令寫成可重跑流程。
Variables, loops, conditionals, set -euo pipefail, logging — turn commands into rerunnable scripts.
rule / process / dependency — 把 Bash script 升級為標準 workflow,一鍵重跑。
rule / process / dependency — upgrade Bash scripts to standard workflows; rerun in one click.
sbatch、資源規劃、Apptainer / Docker — 把流程部署到 cluster 與容器化環境。
sbatch, resource planning, Apptainer / Docker — deploy pipelines on clusters and containers.
每章包含:① 概念解說 ② Cheat sheet ③ 互動終端機(可輸入指令並驗證)④ 真實 NGS 場景練習 ⑤ 雙語自我測驗。建議按照章節順序學習,亦可作為查詢手冊使用。
Each chapter includes: ① concept explanation ② cheat sheet ③ interactive terminal (type commands and get checked) ④ real NGS exercises ⑤ bilingual self-check. Best followed in order, also usable as a reference.
只需理解「檔案/資料夾」與「純文字/表格檔案」的差別。生物背景者可從零開始。
You only need to know what a "file/folder" and a "text/table file" are. Biology background is enough to start from zero.
從 raw_data 到 multiqc / aligned BAM / count matrix,完整可重現專案資料夾。
From raw_data to multiqc / aligned BAM / count matrix — a full reproducible project skeleton.
每一章請務必親手在互動終端機輸入幾個指令,看到結果再進入下一章。
In each chapter, type a few commands into the interactive terminal — see the output before moving on.