STEP 1 / 16

環境安裝與設定

從零開始:R、RStudio、套件管理、Windows 路徑、工作目錄、存檔位置——把所有「卡關小事」一次講清楚。

From zero: R, RStudio, package management, Windows paths, working directories, save locations — every "stuck on the basics" detail spelled out.

一、為什麼用 R?又為什麼配 RStudio?

R 是統計與生物資訊領域最廣泛使用的開源語言,Bioconductor(R 的生資延伸生態圈)擁有超過 2,300 個專為生物資料設計的套件——這是任何其他語言都無法比擬的優勢。從基因表現分析、單細胞、ChIP-seq 到表觀遺傳,幾乎所有主流分析工具都有 R 實作。

R 與 RStudio 的關係:R 是「引擎」,RStudio 是「儀表板」。你可以只裝 R 就能執行所有運算,但少了 RStudio 的編輯器、檔案瀏覽器、繪圖預覽與專案管理,你的工作效率會低非常多。初學者請務必兩者都裝,先裝 R,再裝 RStudio。

R is the most widely used open-source language for statistics and bioinformatics. Bioconductor (R's biology extension ecosystem) hosts over 2,300 packages purpose-built for biological data — an advantage no other language can match. From gene expression to single-cell, ChIP-seq to epigenetics, virtually every mainstream tool has an R implementation.

R vs. RStudio: R is the engine; RStudio is the dashboard. You can run R alone, but without RStudio's editor, file browser, plot pane, and project manager you'll be far less productive. Beginners should install both — R first, then RStudio.

💡
R 與 Python 之爭? 對生資而言這不是 either/or——多數實驗室同時使用兩者。R 的優勢在統計、視覺化、Bioconductor;Python 的優勢在通用程式設計、深度學習、scanpy 生態。能切換是優勢,本教程偶爾會在程式碼分頁顯示 Python 對照寫法。 R vs. Python? For bioinformatics this isn't either/or — most labs use both. R wins on statistics, visualization, and Bioconductor; Python wins on general programming, deep learning, and the scanpy ecosystem. Being bilingual is a strength; this tutorial occasionally shows Python equivalents in the code tabs.

二、安裝 R(核心引擎)

CRAN 官網下載對應你作業系統的安裝檔:

  • Windowshttps://cran.r-project.org/bin/windows/base/ → 下載「Download R-x.x.x for Windows」→ 雙擊 .exe 安裝。
  • macOShttps://cran.r-project.org/bin/macosx/ → 選對應晶片(Intel x86_64 或 Apple Silicon arm64)。
  • Linux (Ubuntu/Debian)sudo apt install r-base r-base-dev,或加入 CRAN repo 安裝最新版。

Download the installer for your OS from CRAN:

  • Windows: https://cran.r-project.org/bin/windows/base/ → "Download R-x.x.x for Windows" → run the .exe.
  • macOS: https://cran.r-project.org/bin/macosx/ → pick your chip (Intel x86_64 or Apple Silicon arm64).
  • Linux (Ubuntu/Debian): sudo apt install r-base r-base-dev, or add the CRAN repo for the latest version.
🪟

Windows 安裝路徑

預設會裝在 C:\Program Files\R\R-4.x.x\不要勾選「為所有使用者建立桌面捷徑」以外的「Add R to PATH」──RStudio 會自動找到它,手動加入 PATH 反而可能讓多版本管理混亂。

Default location: C:\Program Files\R\R-4.x.x\. Don't tick "Add R to PATH" — RStudio finds it automatically, and adding to PATH can complicate multi-version management later.

🍎

macOS 權限提示

第一次開啟可能跳「無法驗證開發者」。到「系統偏好設定 → 隱私權與安全性」按「仍要打開」即可。Apple Silicon 請務必裝 arm64 版,效能差距很大。

You may see "developer cannot be verified". Go to System Settings → Privacy & Security → "Open Anyway". On Apple Silicon, the arm64 build is significantly faster — install the right one.

⚠️
版本選擇:建議使用 最新主版本(如 4.4.x)。Bioconductor 與 R 主版本綁定(例如 Bioc 3.20 ↔ R 4.4),若 R 版本太舊會無法安裝最新生資套件。 Version choice: use the latest major release (e.g. 4.4.x). Bioconductor is pinned to R majors (e.g. Bioc 3.20 ↔ R 4.4); too old an R blocks new packages.

三、安裝 RStudio(IDE)

Posit(RStudio 母公司)下載 RStudio Desktop Free
https://posit.co/download/rstudio-desktop/

網頁會自動偵測作業系統推薦對應版本。Windows 安裝完後在開始選單可以找到 RStudio,第一次開啟會問你要用哪個 R——通常自動偵測即可。

Get RStudio Desktop Free from Posit:
https://posit.co/download/rstudio-desktop/

The page auto-detects your OS. On Windows you'll find RStudio in the Start menu after install. First launch asks which R to use — auto-detect is usually fine.

RStudio 介面四象限

📝 Source(左上)

程式碼編輯器。寫 .R 腳本或 .Rmd 報告的地方。Ctrl/Cmd + Enter 把當前行送到 Console 執行。

The code editor. Where you write .R scripts or .Rmd reports. Ctrl/Cmd + Enter sends the current line to the Console.

💻 Console(左下)

R 引擎的互動視窗。直接打字會立刻執行,看到 > 表示等待輸入;看到 + 表示上一行語法未完成(按 Esc 取消)。

Live R prompt. > means ready; + means the previous expression is incomplete (press Esc to abort).

🌐 Environment / History(右上)

顯示當前載入的所有變數與資料。Environment 分頁是除錯利器——隨時可看到變數的型別、維度與內容。

All variables & data currently in memory. The Environment tab is invaluable for debugging — inspect type, dimensions, and contents at any time.

📁 Files / Plots / Packages / Help(右下)

檔案瀏覽器、繪圖預覽、套件清單與說明文件。?function_name 在 Console 輸入即可在 Help 跳出對應文件。

File browser, plot preview, package list, and help docs. Type ?function_name in the Console to pop up its docs in Help.

四、工作目錄(Working Directory):所有路徑的起點

「工作目錄」是 R 認定的「當前資料夾」。當你寫 read.csv("data.csv"),R 會去工作目錄裡找 data.csv;寫 ggsave("plot.png"),圖也會存到工作目錄。幾乎所有「找不到檔案」的錯誤都是工作目錄設定問題。

The working directory is the folder R treats as "current". When you write read.csv("data.csv"), R looks for data.csv in the working directory; ggsave("plot.png") saves there too. Almost every "file not found" error is a working-directory issue.

基本指令

# 看現在的工作目錄 / Check current working directory
getwd()
#> [1] "C:/Users/Charlene/Documents"

# 切換工作目錄 / Set a new working directory
setwd("E:/Charlene/Bioinformatics_Tutorials/R")

# 列出當前目錄的檔案 / List files in current directory
list.files()
list.files(pattern = "\\.csv$")   # 只看 csv 檔

# 建立子資料夾(若已存在不會報錯)/ Create a subdirectory
dir.create("results", showWarnings = FALSE)
dir.create("results/figures", recursive = TRUE)
🚨
Windows 路徑陷阱:R 中的字串使用 \ 當作跳脫字元。直接複製 Windows 檔案總管裡的 E:\Charlene\R 貼到 R 裡會錯!必須改成下列其中一種:
  • 正斜線:"E:/Charlene/R"(推薦,跨平台通用)
  • 雙反斜線:"E:\\Charlene\\R"(醜但相容 Windows 慣例)
Windows path trap: R uses \ as escape character. You can't paste E:\Charlene\R directly. Use one of:
  • Forward slashes: "E:/Charlene/R" (recommended, cross-platform)
  • Double backslashes: "E:\\Charlene\\R" (ugly but matches Windows convention)

絕對路徑 vs. 相對路徑

🗺️ 絕對路徑

從磁碟根開始的完整路徑:

E:/Charlene/Bioinformatics_Tutorials/R/data/counts.csv

優點:明確,不依賴工作目錄。
缺點:換電腦或換使用者就壞掉,無法分享給合作者。

Full path from disk root:

E:/Charlene/Bioinformatics_Tutorials/R/data/counts.csv

Pros: unambiguous, doesn't depend on working dir.
Cons: breaks when sharing or switching machines.

📍 相對路徑

從工作目錄出發的路徑:

data/counts.csv

優點:可攜,整個專案資料夾打包就能跑。
缺點:必須先正確設定工作目錄。

建議寫法:here::here("data", "counts.csv")(見 I/O 章節)。

Path from the working directory:

data/counts.csv

Pros: portable — zip the project, it still runs.
Cons: requires correct working dir.

Best practice: use here::here("data", "counts.csv") (see I/O chapter).

五、推薦的專案資料夾結構

每一個分析專案請建立獨立資料夾,並用以下結構──這是學界與業界的最佳實踐:

Every analysis project should live in its own folder, organized like this — academic & industry best practice:

my_rnaseq_project/
├── my_rnaseq_project.Rproj    # RStudio 專案檔(雙擊開啟)/ RStudio project file
├── README.md                  # 專案說明 / project description
├── data/                      # 原始資料(唯讀)/ raw data (read-only)
│   ├── raw/                   # 原檔案,永不修改 / never modify
│   └── processed/             # 清理後的中間檔 / cleaned intermediates
├── R/                         # R 腳本 / R scripts
│   ├── 01_load_data.R
│   ├── 02_qc.R
│   ├── 03_dge.R
│   └── 04_enrichment.R
├── results/                   # 分析輸出 / analysis outputs
│   ├── tables/                # 表格 .csv .xlsx
│   └── figures/               # 圖檔 .png .pdf .svg
├── reports/                   # RMarkdown / Quarto 報告 / writeups
│   └── final_report.Rmd
└── renv.lock                  # 套件版本鎖定(見 Repro 章)/ package lockfile
💡
三條鐵律:
  1. data/ 永遠唯讀──腳本只讀不寫,避免污染原始資料。
  2. 每個輸出可重現──所有 results/ 內檔案都應由 R/ 內腳本產出,刪掉重跑也能復原。
  3. 不要寫絕對路徑──用 RStudio Project + here 套件,移到任何電腦都能跑。
Three iron rules:
  1. data/ is read-only — scripts only read from it.
  2. Every output must be reproducible — anything in results/ should regenerate from R/ scripts.
  3. Never hard-code absolute paths — use RStudio Projects + the here package; portable across machines.

六、RStudio Projects:自動管理工作目錄

建立 RStudio Project 後,每次雙擊 .Rproj 檔開啟,工作目錄會自動設成該資料夾──不必再手動 setwd(),腳本可以直接用相對路徑。

Once you create an RStudio Project, double-clicking the .Rproj file opens RStudio with the working directory set automatically — no manual setwd(), and scripts can use relative paths directly.

開啟新專案

RStudio 選單 File → New Project... → New Directory → New Project

Menu: File → New Project... → New Directory → New Project.

命名與位置

Directory name 填 my_rnaseq_project,Create project as subdirectory of 點 Browse 選 E:/Charlene/Bioinformatics_Tutorials/R/

Set Directory name to my_rnaseq_project; Create project as subdirectory of: browse to E:/Charlene/Bioinformatics_Tutorials/R/.

(可選)勾選 renv

勾選 Use renv with this project 啟用套件版本鎖定(見 Reproducibility 章)。新手可先不勾。

Tick Use renv with this project for package version locking (see Reproducibility chapter). Beginners can skip.

建立子資料夾

在 RStudio 右下 Files 分頁,按 New Folder 建立 data/、R/、results/ 等資料夾,或直接在 Console 執行:

In the bottom-right Files tab, hit New Folder to create data/, R/, results/ — or run in Console:

# 一次建立完整專案結構 / Create the full structure in one go
folders <- c("data/raw", "data/processed", "R",
             "results/tables", "results/figures", "reports")
for (f in folders) dir.create(f, recursive = TRUE, showWarnings = FALSE)
list.files(recursive = TRUE, include.dirs = TRUE)
⚠️
避免這些壞習慣:
  • 把腳本和資料散在桌面或下載資料夾。
  • setwd("C:/Users/我的名字/Desktop/proj") 寫死路徑──你的合作者開不了。
  • 所有檔案塞在同一層──三個月後你絕對找不到 final_v2_REAL_FINAL.R 是哪一個。
Avoid these bad habits:
  • Scripts and data scattered across Desktop/Downloads.
  • Hard-coded setwd("C:/Users/me/Desktop/proj") — collaborators can't open it.
  • All files in one folder — in three months, no one knows which is final_v2_REAL_FINAL.R.

七、安裝與載入套件

R 的功能透過套件(package)無限擴充。生資需要懂三大來源:

  1. CRAN──R 官方套件庫,install.packages() 安裝。
  2. Bioconductor──專為生資設計的延伸庫,BiocManager::install() 安裝。
  3. GitHub──開發版或未上 CRAN 的套件,remotes::install_github() 安裝。

R's power comes from packages. For bioinformatics you need three sources:

  1. CRAN — official repo, install.packages().
  2. Bioconductor — biology-focused extension, BiocManager::install().
  3. GitHub — dev versions or non-CRAN packages, remotes::install_github().
# 安裝單一套件 / Install one package
install.packages("tidyverse")

# 安裝多個套件 / Install multiple
install.packages(c("ggplot2", "dplyr", "data.table"))

# 載入套件(每次重啟 R 都要重新載入)/ Load (re-load every R session)
library(tidyverse)

# 不載入也想用:用 :: 直接呼叫 / Use without loading
dplyr::filter(mtcars, mpg > 25)

# 看已安裝套件 / List installed
installed.packages()[, "Package"]

# 更新所有套件 / Update everything
update.packages(ask = FALSE)
# 第一次:先裝 BiocManager / First-time: install BiocManager
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

# 安裝 Bioc 套件 / Install Bioc packages
BiocManager::install(c("DESeq2", "limma", "clusterProfiler"))

# 檢查 Bioc 版本是否與 R 對齊 / Check Bioc version matches R
BiocManager::version()
BiocManager::valid()

# 載入後查看 vignette(套件作者寫的教學)/ Browse vignettes
library(DESeq2)
browseVignettes("DESeq2")
# 第一次:先裝 remotes / First-time
install.packages("remotes")

# 安裝 GitHub 上的套件(user/repo 格式)
remotes::install_github("satijalab/seurat")

# 指定特定 commit / branch / tag
remotes::install_github("satijalab/seurat@v5.0.0")

# 開發者:本機資料夾安裝(用於套件開發)
remotes::install_local("E:/dev/myPackage")

📦 套件儲存位置(library path)

知道套件存哪可以解決很多奇怪問題(例如套件衝突、權限不足)。

Knowing where packages live solves many weird issues (conflicts, permission errors).

# 看所有套件搜尋路徑 / All library search paths
.libPaths()

# Windows 預設使用者層級路徑 / Typical Windows per-user path:
#   C:/Users/<you>/AppData/Local/R/win-library/4.4

# 系統層級路徑(需管理員權限)/ System-wide (admin only):
#   C:/Program Files/R/R-4.4.x/library

# 自訂專案內 library(搭配 renv 的做法)/ Project-local library (renv style)
.libPaths(c("./renv/library/R-4.4/x86_64-w64-mingw32", .libPaths()))
🚨
常見安裝失敗原因(Windows):
  • 缺 Rtools──某些套件需要從原始碼編譯。到 https://cran.r-project.org/bin/windows/Rtools/ 下載對應 R 版本的 Rtools 並安裝。
  • 路徑含中文或空格──強烈建議把使用者資料夾或工作目錄改成純英文路徑。
  • 權限不足──安裝到「Program Files」需要系統管理員,建議不要碰系統路徑,改用使用者層級安裝。
Common install failures (Windows):
  • Missing Rtools — some packages need source compilation. Download Rtools matching your R version from https://cran.r-project.org/bin/windows/Rtools/.
  • Chinese characters or spaces in path — strongly recommend an ASCII-only working directory.
  • Permission denied — installing to "Program Files" needs admin. Prefer user-level library.

八、第一支可運行的腳本

把以下程式碼貼到 RStudio Source 編輯器,按 Ctrl/Cmd + Shift + S 執行整份腳本。或者直接點下方 ▶ 在瀏覽器試跑:

Paste this in the Source editor and press Ctrl/Cmd + Shift + S to run the whole script. Or hit ▶ below to try it in-browser:

# ===== 第一支 R 腳本 / Your first R script =====
# 1. 計算 / Calculate
2 + 3 * 4
sqrt(144)

# 2. 變數 / Variables
gene_count <- 20000
sample_size <- 100
cat("Total measurements:", gene_count * sample_size, "\n")

# 3. 向量 / Vectors
expression <- c(5.2, 8.1, 3.4, 9.7, 2.1)
mean(expression)
sd(expression)

# 4. 內建資料集 + 圖 / Built-in data + plot
plot(iris$Sepal.Length, iris$Sepal.Width,
     col = iris$Species, pch = 19,
     xlab = "Sepal Length", ylab = "Sepal Width",
     main = "Iris")
legend("topright", legend = levels(iris$Species),
       col = 1:3, pch = 19)
⌨️
必背快捷鍵(提升效率 10 倍):
  • Ctrl/Cmd + Enter 執行當前行(最常用)
  • Ctrl/Cmd + Shift + S 執行整份腳本
  • Ctrl/Cmd + Shift + M 插入 pipe |>%>%
  • Alt + - 插入賦值符號 <-
  • Ctrl/Cmd + Shift + R 插入區段標題(可摺疊)
  • Ctrl/Cmd + L 清空 Console
Memorize these shortcuts (10× productivity):
  • Ctrl/Cmd + Enter Run current line (most common)
  • Ctrl/Cmd + Shift + S Source the entire script
  • Ctrl/Cmd + Shift + M Insert pipe |> or %>%
  • Alt + - Insert assignment <-
  • Ctrl/Cmd + Shift + R Insert section header (foldable)
  • Ctrl/Cmd + L Clear Console

九、存檔的多種方式

分析跑完之後,要把結果存下來。R 中常見的「存檔」分四類:

Once your analysis is done you need to persist results. Four common kinds of "save":

存什麼 函式 副檔名 特性
表格資料(人讀)write.csv() / readr::write_csv().csv .tsvExcel/任何工具可開;體積大、無型別資訊。
單一 R 物件(最快)saveRDS() / readRDS().rds保留所有 R 屬性,跨機器可讀;只能存一個物件。
多個物件 / 整個 workspacesave() / load().RData .rda可存多物件;load() 會直接覆寫同名變數,謹慎使用。
圖片ggsave() / png() + dev.off().png .pdf .svgggsave() 是 ggplot 的便捷函式;可指定 dpi 與尺寸。
# 存 CSV / Save CSV
write.csv(my_results, "results/tables/dge_results.csv", row.names = FALSE)

# tidyverse 寫法(更快、預設不寫 row names)/ tidyverse style
readr::write_csv(my_results, "results/tables/dge_results.csv")

# 帶日期的檔名(避免覆蓋)/ Date-stamped filename
fname <- paste0("results/tables/dge_", Sys.Date(), ".csv")
readr::write_csv(my_results, fname)
#> results/tables/dge_2026-05-07.csv
# 存 / Save
saveRDS(deseq_object, "results/dds.rds")

# 讀(注意:讀回來時要指派給變數)/ Load (must assign to variable)
dds <- readRDS("results/dds.rds")

# RDS 的優點:保留 S4 物件、factor 順序、attributes 等
# 跑很久的分析(幾小時),務必 saveRDS() 中間結果!
# ggplot 物件 / ggplot object
p <- ggplot2::ggplot(iris, ggplot2::aes(Sepal.Length, Sepal.Width, color = Species)) +
     ggplot2::geom_point()

ggplot2::ggsave("results/figures/iris_scatter.png", p,
                 width = 6, height = 4, dpi = 300)

# 出版用 PDF(向量圖、可放大不失真)/ Publication PDF
ggplot2::ggsave("results/figures/iris_scatter.pdf", p, width = 6, height = 4)

# 基本繪圖(非 ggplot)/ Base R plot
png("results/figures/base_plot.png", width = 800, height = 600, res = 120)
plot(iris$Sepal.Length, iris$Sepal.Width)
dev.off()  # 一定要 dev.off() 才會寫入檔案!/ Required to flush to disk!
⚠️
關於「儲存 R 工作區」的對話框:關閉 RStudio 時會問「Save workspace image to .RData?」建議永遠選 No(並到 Tools → Global Options → General 把 "Save workspace to .RData on exit" 改成 Never)。原因:腳本應該是真理來源,重開 R 應該從乾淨環境開始重跑,避免「上次殘留變數導致今天結果不一樣」的災難。 About the "save workspace" dialog: on exit RStudio asks "Save workspace image to .RData?" Always choose No (and in Tools → Global Options → General set "Save workspace to .RData on exit" to Never). Scripts are the source of truth — every R session should start clean and re-run from scratch, to avoid "leftover variables from yesterday" disasters.

十、決策樹:我該用哪個函式?

🤔 任務 → 推薦函式

Q1.想知道現在工作目錄? getwd()
Q2. 或開啟 RStudio Project" data-en="Change directory? setwd(\"E:/path\") or open an RStudio Project">想換個工作目錄? setwd(\"E:/path\") 或開啟 RStudio Project
Q3." data-en="What's in a folder? list.files(\"data/\")">想知道某資料夾有什麼檔? list.files(\"data/\")
Q4." data-en="Create subfolder? dir.create(\"results/figures\", recursive=TRUE)">想建子資料夾? dir.create(\"results/figures\", recursive=TRUE)
Q5.想存表格給合作者? write.csv()writexl::write_xlsx()
Q6.想存中間結果之後再讀回來? saveRDS() / readRDS()
Q7.想存 ggplot 圖? ggsave()

📝 自我檢測

1. 在 Windows 上,下列哪個路徑寫法在 R 中會出錯?

1. Which Windows path string fails in R?

A. "E:/Charlene/data.csv"A. "E:/Charlene/data.csv"
B. "E:\\Charlene\\data.csv"B. "E:\\Charlene\\data.csv"
C. "E:\Charlene\data.csv"C. "E:\Charlene\data.csv"
D. file.path("E:", "Charlene", "data.csv")D. file.path("E:", "Charlene", "data.csv")

2. 你雙擊一個 .Rproj 檔開啟 RStudio,下列何者正確?

2. You double-click a .Rproj file. Which is correct?

A. 仍需在腳本最前面寫 setwd() 才能用相對路徑。A. You still need setwd() at the top of every script.
B. 工作目錄會自動設成 .Rproj 所在資料夾,可直接用相對路徑。B. Working dir is set to the .Rproj folder automatically; relative paths just work.
C. 必須以系統管理員權限執行才能讀取檔案。C. Must run RStudio as admin to read files.
D. RStudio 會自動安裝專案內所有套件。D. RStudio auto-installs all packages used in the project.

3. 想要安裝 Bioconductor 上的 DESeq2,正確的指令是?

3. Correct way to install DESeq2 from Bioconductor?

A. install.packages("DESeq2")A. install.packages("DESeq2")
B. remotes::install_github("DESeq2")B. remotes::install_github("DESeq2")
C. BiocManager::install("DESeq2")C. BiocManager::install("DESeq2")
D. library(DESeq2) 就會自動下載D. library(DESeq2) auto-downloads it

4. 關閉 RStudio 時跳出「Save workspace image to .RData?」最佳的選擇是?

4. RStudio asks "Save workspace image to .RData?" on exit. Best answer?

A. No──腳本是真理來源,每次重開都從乾淨環境重跑。A. No — scripts are the source of truth; always start clean.
B. Yes──這樣下次打開可以直接接續分析。B. Yes — so you can resume next time.
C. 取消對話框,繼續工作。C. Cancel and keep working.
D. 看狀況──小資料集存、大資料集不存。D. Depends — save for small data, skip for big.