一、圖形語法 (Grammar of Graphics)
ggplot2 把任何圖拆成七個層次:
ggplot2 decomposes any plot into seven layers:
| 層次 | 函式範例 | 作用 |
|---|---|---|
| data | ggplot(df, ...) | 資料 (必須是 data.frame / tibble) |
| aesthetics | aes(x, y, color) | 把資料欄對應到視覺屬性 |
| geom | geom_point() | 幾何形狀 (點/線/長條/箱型...) |
| stat | stat_smooth() | 統計轉換 (回歸線、分布) |
| scale | scale_color_manual() | 調整顏色/座標軸範圍 |
| facet | facet_wrap(~ Species) | 分面 (小多圖) |
| theme | theme_classic() | 非資料元素的外觀 |
二、第一張 ggplot
library(ggplot2)
# 散佈圖 / Scatter plot
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
geom_point()
# 加顏色映射 / Add color mapping
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 2)
# 多層疊加 / Multiple layers
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
geom_point(size = 2, alpha = 0.7) +
geom_smooth(method = "lm", se = TRUE) +
labs(title = "Sepal dimensions across iris species",
subtitle = "n = 150 flowers, 50 per species",
x = "Sepal length (cm)",
y = "Sepal width (cm)",
caption = "Data: Anderson 1935") +
theme_classic(base_size = 13)
三、aes() 裡 vs 外──最大新手陷阱
放在 aes() 裡面:把欄位對應到視覺屬性(會出現圖例)。
放在 aes() 外面:固定常數值,不出現圖例。
Inside aes(): map a column to an attribute (legend appears).
Outside aes(): set a constant value (no legend).
# ❌ 錯誤:把字串放 aes 裡會被當成「資料」 ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(aes(color = "blue")) # 出現一個叫 "blue" 的圖例! # ✅ 想固定顏色:放在 aes() 外面 ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(color = "blue", size = 2) # 真的藍色,無圖例 # ✅ 想依 Species 上色:放 aes() 裡面 ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point(aes(color = Species), size = 2)
四、生資最常用的 geom
geom_point
散佈圖、火山圖、PCA。
Scatter, volcano, PCA.
geom_line / path
時序、ROC、生存曲線。
Time series, ROC, survival.
geom_bar / col
計次/已知數值的長條。
Counts / known-value bars.
geom_boxplot
分組分布比較。
Group distribution comparison.
geom_violin
密度型 boxplot,看分布形狀。
Density-style boxplot.
geom_histogram
單一連續變數分布。
Distribution of one continuous var.
geom_density
平滑密度曲線。
Smoothed density.
geom_tile
熱圖/matrix 視覺化。
Heatmap / matrix viz.
geom_text / label
標註點。ggrepel 自動避免重疊。
Annotate points; use ggrepel to avoid overlap.
# Boxplot + jitter 點 ggplot(iris, aes(Species, Sepal.Length, fill = Species)) + geom_boxplot(alpha = 0.5, outlier.shape = NA) + geom_jitter(width = 0.15, alpha = 0.6) + theme_classic() # Violin + boxplot 疊加 ggplot(iris, aes(Species, Sepal.Length, fill = Species)) + geom_violin(alpha = 0.5) + geom_boxplot(width = 0.15, fill = "white") + theme_classic()
# Volcano plot — DGE 標準視覺化 library(ggplot2); library(ggrepel) dge$sig <- with(dge, padj < 0.05 & abs(log2FC) > 1) ggplot(dge, aes(log2FC, -log10(padj))) + geom_point(aes(color = sig), alpha = 0.7) + scale_color_manual(values = c("grey60", "firebrick")) + geom_vline(xintercept = c(-1, 1), linetype = "dashed") + geom_hline(yintercept = -log10(0.05), linetype = "dashed") + geom_text_repel(data = subset(dge, sig), aes(label = symbol), max.overlaps = 15) + labs(x = "log2 fold change", y = "-log10 adjusted p") + theme_classic()
# 直方圖 + 密度疊加 / Histogram + density overlay ggplot(iris, aes(Sepal.Length, fill = Species)) + geom_histogram(aes(y = ..density..), bins = 20, alpha = 0.5, position = "identity") + geom_density(aes(color = Species), linewidth = 1, fill = NA) + theme_minimal() # 直方圖:bins 數量會大幅影響視覺,多試幾個 / try multiple bin counts ggplot(iris, aes(Sepal.Length)) + geom_histogram(bins = 10) ggplot(iris, aes(Sepal.Length)) + geom_histogram(binwidth = 0.2)
# 簡易 heatmap:先轉長表 / Simple heatmap: pivot to long first library(tidyr); library(dplyr) mat_long <- mat |> as.data.frame() |> tibble::rownames_to_column("gene") |> pivot_longer(-gene, names_to = "sample", values_to = "expr") ggplot(mat_long, aes(sample, gene, fill = expr)) + geom_tile() + scale_fill_viridis_c() + theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) # 進階熱圖請見 Step 12,用 ComplexHeatmap 更專業
五、分面 (facet)──小多圖
library(ggplot2)
# facet_wrap:按一個變數分面,自動換列
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point() +
facet_wrap(~ Species)
# facet_wrap 控制列數
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
geom_point() +
facet_wrap(~ Species, nrow = 1)
# facet_grid:行 × 列 兩變數
mtcars$cyl <- factor(mtcars$cyl)
mtcars$am <- factor(mtcars$am, labels = c("auto","manual"))
ggplot(mtcars, aes(wt, mpg)) + geom_point() +
facet_grid(am ~ cyl)
# 自由縮放 scales / Free scales
ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point() +
facet_wrap(~ Species, scales = "free") # 各分面獨立座標軸
六、主題與精修
| theme | 風格 |
|---|---|
theme_grey() | 預設灰底 |
theme_classic() | 出版常用,僅 x/y 軸線 |
theme_minimal() | 極簡,只留淡格線 |
theme_bw() | 白底黑邊 |
theme_void() | 完全沒有座標軸/格線 |
| ggthemes / hrbrthemes / ggpubr | 第三方主題包 |
# 修主題的常見語法
p <- ggplot(iris, aes(Species, Sepal.Length, fill = Species)) +
geom_boxplot()
p + theme_classic(base_size = 14) + # 全圖字體基準大小
theme(
legend.position = "top",
legend.title = element_blank(),
axis.title.x = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1, face = "italic"),
panel.grid.major.y = element_line(color = "grey90"),
plot.title = element_text(face = "bold", hjust = 0.5)
) +
labs(title = "Sepal length by species", y = "Sepal length (cm)")
# 自訂配色 / Custom colors
p + scale_fill_manual(values = c("setosa"="#1f4e8c","versicolor"="#d4a03c","virginica"="#c0392b"))
p + scale_fill_brewer(palette = "Set2")
p + scale_fill_viridis_d(option = "C") # 連續變數用 _c, 離散用 _d
- 避免紅綠──色盲人口約 8%,會分不出。
- 離散變數用
scale_*_brewer(palette="Set2")或 viridis discrete。 - 連續變數用 viridis (
scale_*_viridis_c()) 或 ColorBrewer 序列型 ("Blues", "Reds")。 - 有方向性(如 logFC)用發散型調色 (RdBu, PRGn) 並把白色定在 0。
- Avoid red/green — ~8% of people are color-blind.
- Discrete data:
scale_*_brewer(palette="Set2")or viridis discrete. - Continuous data: viridis (
scale_*_viridis_c()) or ColorBrewer sequential ("Blues", "Reds"). - Directional data (e.g. logFC): use diverging palettes (RdBu, PRGn) with white at zero.
七、ggsave──輸出出版品級圖
p <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
geom_point(size = 2) +
theme_classic()
# PNG(給 PowerPoint / 預覽)
ggsave("results/figures/scatter.png", plot = p,
width = 6, height = 4, dpi = 300, units = "in")
# PDF(給 paper / Illustrator 編輯,向量、可放大不糊)
ggsave("results/figures/scatter.pdf", plot = p, width = 6, height = 4)
# SVG(網頁、向量、檔案小)
ggsave("results/figures/scatter.svg", plot = p, width = 6, height = 4)
# TIFF + LZW 壓縮(很多期刊指定)
ggsave("results/figures/scatter.tiff", plot = p,
width = 7, height = 5, dpi = 600, compression = "lzw")
# 一次存多個尺寸(給縮圖 + 大圖)
for (w in c(3, 6, 12)) {
ggsave(sprintf("results/figures/scatter_w%d.png", w),
plot = p, width = w, height = w * 0.7, dpi = 300)
}
八、patchwork:用 + / 排版多圖
library(ggplot2); library(patchwork)
p1 <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point()
p2 <- ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) + geom_point()
p3 <- ggplot(iris, aes(Species, Sepal.Length, fill = Species)) + geom_boxplot()
# 並排 / Side by side
p1 + p2
# 上下 / Stacked
p1 / p2
# 複雜佈局 / Complex
(p1 + p2) / p3
# 加標題、abc 子圖編號
((p1 + p2) / p3) +
plot_annotation(title = "Iris dataset", tag_levels = "A")
# 收集 legend / Combine legends
(p1 + p2) + plot_layout(guides = "collect") &
theme(legend.position = "bottom")
# ggsave 直接存組合圖
ggsave("results/figures/combined.png",
plot = (p1 + p2) / p3, width = 9, height = 6, dpi = 300)
📝 自我檢測
1. geom_point(aes(color = "blue")) 與 geom_point(color = "blue") 的差別?
1. Difference between geom_point(aes(color = "blue")) and geom_point(color = "blue")?
2. 想把火山圖(log2FC vs -log10 padj)裡的顯著基因標上基因名且自動避免重疊,應使用?
2. To label significant genes on a volcano plot without overlap — use?
3. 想把連續型 log2 fold-change 用「以 0 為中心、紅藍發散」的調色,最佳選擇?
3. For continuous log2FC with "red-blue diverging from 0" palette, best choice?