Linux for Bioinformatics — Interactive Tutorial

先學會在 Linux 中找路、看檔、改名、建立專案結構，建立真正能整理 NGS 資料的能力。

Learn to navigate, view, rename and build a project skeleton — the foundation for organising NGS data on Linux.

🧭

Ch 1

為什麼生資需要 Linux？

理解 Linux 在 NGS、HPC、可重現分析中的角色，以及課程學習地圖。

Understand Linux's role in NGS, HPC, reproducible analysis — and the course roadmap.

🗂️

Ch 2

Terminal、Shell 與檔案系統

pwd、cd、ls、絕對/相對路徑，從 0 開始用 NGS 專案結構練習。

pwd, cd, ls, absolute vs relative paths — practiced on a real NGS project tree.

📁

Ch 3

檔案操作與權限

cp、mv、rm、mkdir、chmod、ln — 安全管理 raw_data 與 reference。

cp, mv, rm, mkdir, chmod, ln — safely manage raw_data and reference files.

🌊

Ch 4

文字檔與串流處理

cat、less、head、tail、wc、pipe、redirect — 不開檔就能讀懂大型 FASTQ。

cat, less, head, tail, wc, pipes, redirects — read huge FASTQs without opening them.

STAGE 2 中階篇 — 文字處理與環境管理

把 Linux 變成生物資訊資料的瑞士刀：grep / cut / awk / sed 處理 GTF、BED、樣本表，並學會用 conda 管理可重現的軟體環境。

Turn Linux into a bioinformatics Swiss-army knife: grep / cut / awk / sed on GTF, BED, sample sheets — plus conda for reproducible environments.

🔍

Ch 5

搜尋、篩選與欄位處理

grep、cut、sort、uniq、paste、join — 從 GTF 抽出 gene、做 QC summary 表。

grep, cut, sort, uniq, paste, join — extract genes from GTF, build QC summary tables.

⚙️

Ch 6

awk 與 sed

欄位運算、條件判斷、批次取代 — 處理 GTF/BED/sample sheet 的進階武器。

Field arithmetic, conditionals, batch substitution — power tools for GTF/BED/sample sheets.

📦

Ch 7

壓縮、下載與資料完整性

wget、curl、rsync、gzip、tar、md5sum — 下載公開資料並驗證 FASTQ 完整性。

wget, curl, rsync, gzip, tar, md5sum — fetch public data and verify FASTQ integrity.

🐍

Ch 8

conda 與環境管理

conda / mamba、Bioconda、environment.yml — 每個專案一個環境，可重現性的起點。

conda / mamba, Bioconda, environment.yml — one env per project, the basis of reproducibility.

STAGE 3 生資應用篇 — 格式與工具實戰

把 Linux 技能落到 NGS 真正的工作流：FASTQ/FASTA/GTF/BED/SAM/BAM/VCF、FastQC、MultiQC、samtools、bedtools。

Apply Linux to real NGS work: FASTQ/FASTA/GTF/BED/SAM/BAM/VCF, FastQC, MultiQC, samtools, bedtools.

🧬

Ch 9

生物資訊檔案格式

FASTQ / FASTA / GTF / GFF / BED / SAM / BAM / VCF — 認識欄位、檢查、抽樣。

FASTQ / FASTA / GTF / GFF / BED / SAM / BAM / VCF — fields, sanity checks, sampling.

🧪

Ch 10

NGS QC 實作

批次跑 FastQC、整合 MultiQC 報告、解讀常見品質指標。

Batch FastQC, aggregate via MultiQC, and read common quality metrics.

🎯

Ch 11

Mapping 與 SAM/BAM 處理

bwa / hisat2 / minimap2、samtools sort/index、flag 統計、BAM 整理。

bwa / hisat2 / minimap2, samtools sort/index, flag stats, BAM tidying.

STAGE 4 進階篇 — 自動化、Workflow 與 HPC

從一次性指令成長為可重現 pipeline：Bash script → Snakemake/Nextflow → Slurm + Container。

Grow from one-off commands to reproducible pipelines: Bash script → Snakemake/Nextflow → Slurm + Container.

📜

Ch 12

Bash Scripting 與批次化

變數、loop、條件判斷、set -euo pipefail、log — 把指令寫成可重跑流程。

Variables, loops, conditionals, set -euo pipefail, logging — turn commands into rerunnable scripts.

🔧

Ch 13

Workflow Manager

rule / process / dependency — 把 Bash script 升級為標準 workflow，一鍵重跑。

rule / process / dependency — upgrade Bash scripts to standard workflows; rerun in one click.

🚀

Ch 14

HPC、Slurm 與容器

sbatch、資源規劃、Apptainer / Docker — 把流程部署到 cluster 與容器化環境。

sbatch, resource planning, Apptainer / Docker — deploy pipelines on clusters and containers.

EXTRA 學習資源

每章包含：① 概念解說 ② Cheat sheet ③ 互動終端機（可輸入指令並驗證）④ 真實 NGS 場景練習 ⑤ 雙語自我測驗。建議按照章節順序學習，亦可作為查詢手冊使用。

Each chapter includes: ① concept explanation ② cheat sheet ③ interactive terminal (type commands and get checked) ④ real NGS exercises ⑤ bilingual self-check. Best followed in order, also usable as a reference.

🧰

先備能力

不需要程式背景

只需理解「檔案／資料夾」與「純文字／表格檔案」的差別。生物背景者可從零開始。

You only need to know what a "file/folder" and a "text/table file" are. Biology background is enough to start from zero.

🎯

學習成果

獨立完成迷你 pipeline

從 raw_data 到 multiqc / aligned BAM / count matrix，完整可重現專案資料夾。

From raw_data to multiqc / aligned BAM / count matrix — a full reproducible project skeleton.

📚

使用建議

逐章 + 動手操作

每一章請務必親手在互動終端機輸入幾個指令，看到結果再進入下一章。

In each chapter, type a few commands into the interactive terminal — see the output before moving on.

生物資訊 Linux 互動式教學

STAGE 1 基礎篇 — 命令列與檔案系統