CHAPTER 8 / 14

conda、Bioconda 與環境管理

每個專案一個環境、environment.yml 是可重現性的起點。生資工具版本怎麼裝、怎麼換、怎麼分享。

One env per project — environment.yml is the seed of reproducibility. How to install, switch and share your bioinformatics tool versions.

為什麼要把工具裝在「環境」裡?

生物資訊分析常見的崩潰場景:

  • 專案 A 用 samtools 1.9,專案 B 用 samtools 1.20,兩者結果不同
  • 新裝的 BWA 拉錯了 zlib 版本,把舊環境的 STAR 弄壞了
  • 論文寫完一年後想重跑,發現工具升級到後續版本,結果差了一截

解法是 每個專案隔離一個 conda 環境,並把環境寫進 environment.yml。即使 5 年後,一行 conda env create -f environment.yml 就能重建。

Common bioinformatics meltdowns:

  • Project A uses samtools 1.9, project B uses 1.20 — results differ.
  • A fresh BWA install drags in the wrong zlib and breaks STAR in another env.
  • One year later you can't reproduce a paper because every tool has been silently upgraded.

The fix: one isolated conda env per project, captured in environment.yml. Even five years later, conda env create -f environment.yml recreates it exactly.

安裝 Miniconda / Mamba

Miniconda (最小化的 conda)

不要裝完整的 Anaconda!它包含太多不必要的套件,環境很重。Miniconda 只含 conda + Python,再依需求安裝。

Avoid the full Anaconda — it ships with way too much. Miniconda is just conda + Python; install everything else as needed.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
source ~/.bashrc

Mamba (更快的 conda)

conda 解依賴超慢,mamba 用 C++ 重寫,速度快 10 倍以上。是現代 NGS 環境的標準配置。

conda's solver is slow; mamba is a C++ rewrite ~10× faster — the modern NGS standard.

conda install -n base -c conda-forge mamba -y
# 之後 mamba 與 conda 用法幾乎一樣 / use mamba like conda

Channels — Bioconda 是生資的家

Conda 從「channel」下載套件,常見三個:

Conda fetches packages from "channels". The three you need:

Channel提供
conda-forge高品質通用套件
bioconda生資工具
defaults官方 channel(建議移除)
# Bioconda 官方推薦 channel 順序:strict + conda-forge 優先
# Bioconda's official setup: strict + conda-forge priority
conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict
conda config --show channels
# channels:
#   - conda-forge
#   - bioconda
#   - defaults

一個 RNA-seq 環境的標準做法

conda create -n rnaseq python=3.11
建立新環境
conda activate rnaseq
切換環境
conda deactivate
退出
conda env list
列出所有環境
conda list
列出套件
conda install -c bioconda samtools=1.20
指定版本安裝
conda env remove -n old_env
刪除環境

一行建好 RNA-seq 標準工具集

mamba create -n rnaseq -c conda-forge -c bioconda \
  python=3.11 \
  fastqc multiqc \
  fastp trim-galore \
  star hisat2 salmon kallisto \
  samtools bcftools bedtools \
  subread \
  -y

conda activate rnaseq
conda env export > environment.yml
name: rnaseq
channels:
  - conda-forge
  - bioconda
dependencies:
  - python=3.11
  - fastqc=0.12.1
  - multiqc=1.21
  - fastp=0.23.4
  - star=2.7.11b
  - salmon=1.10.2
  - samtools=1.20
  - bedtools=2.31.1
  - subread=2.0.6
  - pip
  - pip:
      - # pip-only packages here
💡

實務建議:environment.ymlscripts/ 一起放進 git;論文發表時連同分析資料一起提供,這就是符合 FAIR + 可重現性的標準做法。

Best practice: commit environment.yml alongside scripts/ in git, and ship it with your paper's data release — this is the FAIR-compliant reproducibility baseline.

新手最常踩的 5 個坑

🌳 conda 排錯流程

裝在 base 永遠新建 env,base 只裝 mamba。
解依賴卡住、跑超久 改用 mamba;或加 channel_priority strict
裝完工具找不到 確認 conda activate 了;用 which samtools 查路徑。
跨機器搬不動 不要直接複製 envs;用 environment.yml 重建。
HPC 沒裝 conda 安裝在 $HOME/miniconda3,加進 ~/.bashrc,再 conda init bash
Installed into base always create a new env. Keep base for mamba only.
Solver hangs forever switch to mamba, or add channel_priority strict.
Tool installed but not found did you conda activate? Check which samtools.
Can't move env across machines don't copy envs; rebuild from environment.yml.
No conda on HPC install Miniconda in $HOME/miniconda3, add to ~/.bashrc, then conda init bash.

互動:建立並切換 conda 環境

📝 自我檢測

1. 為什麼 environment.yml 對可重現性這麼重要?

1. Why is environment.yml so critical for reproducibility?

A. 因為它檔案小A. Because the file is small
B. 因為它能在 git 中追蹤B. Because git can track it
C. 任何人在任何機器都可一行 conda env create -f environment.yml 重建環境C. Anyone can rebuild the exact env on any machine with one command
D. 它取代論文的 methods 章節D. It replaces the methods section of a paper

2. mambaconda 的關係?

2. Relationship between mamba and conda?

A. mamba 取代 PythonA. mamba replaces Python
B. mamba 是更快的 conda 替代品,用法一致B. mamba is a faster conda drop-in with the same CLI
C. mamba 只能裝 R 套件C. mamba only installs R packages
D. mamba 不支援 BiocondaD. mamba doesn't support Bioconda

3. 設定 conda channel 的標準推薦順序?

3. Standard channel priority order?

A. conda-forge → bioconda → defaults,並設 strictA. conda-forge → bioconda → defaults, with strict priority
B. defaults → bioconda → conda-forgeB. defaults → bioconda → conda-forge
C. 順序不重要C. Order doesn't matter
D. 只用 defaults 就好D. Use defaults only

4. 安裝 samtools 後,which samtools 找不到。最可能的原因?

4. After installing samtools, which samtools finds nothing. Most likely cause?

A. samtools 已過期A. samtools has been deprecated
B. 必須重啟電腦B. You need to reboot
C. 忘了 conda activate <env>C. Forgot to conda activate <env>
D. 系統不支援 samtoolsD. The system doesn't support samtools