STEP 6 / 16

ggplot2 視覺化

從圖形語法到出版級調整──掌握 ggplot2 就掌握了 R 視覺化的 90%。

From the grammar of graphics to publication tweaks — ggplot2 is 90% of R visualization.

一、圖形語法 (Grammar of Graphics)

ggplot2 把任何圖拆成七個層次

ggplot2 decomposes any plot into seven layers:

層次函式範例作用
dataggplot(df, ...)資料 (必須是 data.frame / tibble)
aestheticsaes(x, y, color)把資料欄對應到視覺屬性
geomgeom_point()幾何形狀 (點/線/長條/箱型...)
statstat_smooth()統計轉換 (回歸線、分布)
scalescale_color_manual()調整顏色/座標軸範圍
facetfacet_wrap(~ Species)分面 (小多圖)
themetheme_classic()非資料元素的外觀
💡
看圖法則:看到 ggplot 程式碼時,從上往下逐層讀──「用什麼資料,把哪些欄對應到 x/y/color,畫什麼形狀,再加什麼統計,調什麼配色,要不要分面,最後套什麼主題」。 Reading rule: read ggplot top-down — "which data, which columns map to x/y/color, what shape, any stats, what palette, faceted?, theme?"

二、第一張 ggplot

library(ggplot2)

# 散佈圖 / Scatter plot
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point()

# 加顏色映射 / Add color mapping
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 2)

# 多層疊加 / Multiple layers
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 2, alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(title    = "Sepal dimensions across iris species",
       subtitle = "n = 150 flowers, 50 per species",
       x        = "Sepal length (cm)",
       y        = "Sepal width (cm)",
       caption  = "Data: Anderson 1935") +
  theme_classic(base_size = 13)

三、aes() 裡 vs 外──最大新手陷阱

放在 aes() 裡面:把欄位對應到視覺屬性(會出現圖例)。
放在 aes() 外面:固定常數值,不出現圖例。

Inside aes(): map a column to an attribute (legend appears).
Outside aes(): set a constant value (no legend).

# ❌ 錯誤:把字串放 aes 裡會被當成「資料」
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(aes(color = "blue"))     # 出現一個叫 "blue" 的圖例!

# ✅ 想固定顏色:放在 aes() 外面
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(color = "blue", size = 2)   # 真的藍色,無圖例

# ✅ 想依 Species 上色:放 aes() 裡面
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) +
  geom_point(aes(color = Species), size = 2)

四、生資最常用的 geom

geom_point

散佈圖、火山圖、PCA。

Scatter, volcano, PCA.

geom_line / path

時序、ROC、生存曲線。

Time series, ROC, survival.

geom_bar / col

計次/已知數值的長條。

Counts / known-value bars.

geom_boxplot

分組分布比較。

Group distribution comparison.

geom_violin

密度型 boxplot,看分布形狀。

Density-style boxplot.

geom_histogram

單一連續變數分布。

Distribution of one continuous var.

geom_density

平滑密度曲線。

Smoothed density.

geom_tile

熱圖/matrix 視覺化。

Heatmap / matrix viz.

geom_text / label

標註點。ggrepel 自動避免重疊。

Annotate points; use ggrepel to avoid overlap.

# Boxplot + jitter 點
ggplot(iris, aes(Species, Sepal.Length, fill = Species)) +
  geom_boxplot(alpha = 0.5, outlier.shape = NA) +
  geom_jitter(width = 0.15, alpha = 0.6) +
  theme_classic()

# Violin + boxplot 疊加
ggplot(iris, aes(Species, Sepal.Length, fill = Species)) +
  geom_violin(alpha = 0.5) +
  geom_boxplot(width = 0.15, fill = "white") +
  theme_classic()
# Volcano plot — DGE 標準視覺化
library(ggplot2); library(ggrepel)

dge$sig <- with(dge, padj < 0.05 & abs(log2FC) > 1)

ggplot(dge, aes(log2FC, -log10(padj))) +
  geom_point(aes(color = sig), alpha = 0.7) +
  scale_color_manual(values = c("grey60", "firebrick")) +
  geom_vline(xintercept = c(-1, 1), linetype = "dashed") +
  geom_hline(yintercept = -log10(0.05), linetype = "dashed") +
  geom_text_repel(data = subset(dge, sig),
                  aes(label = symbol), max.overlaps = 15) +
  labs(x = "log2 fold change", y = "-log10 adjusted p") +
  theme_classic()
# 直方圖 + 密度疊加 / Histogram + density overlay
ggplot(iris, aes(Sepal.Length, fill = Species)) +
  geom_histogram(aes(y = ..density..), bins = 20,
                 alpha = 0.5, position = "identity") +
  geom_density(aes(color = Species), linewidth = 1, fill = NA) +
  theme_minimal()

# 直方圖:bins 數量會大幅影響視覺,多試幾個 / try multiple bin counts
ggplot(iris, aes(Sepal.Length)) + geom_histogram(bins = 10)
ggplot(iris, aes(Sepal.Length)) + geom_histogram(binwidth = 0.2)
# 簡易 heatmap:先轉長表 / Simple heatmap: pivot to long first
library(tidyr); library(dplyr)
mat_long <- mat |> as.data.frame() |>
  tibble::rownames_to_column("gene") |>
  pivot_longer(-gene, names_to = "sample", values_to = "expr")

ggplot(mat_long, aes(sample, gene, fill = expr)) +
  geom_tile() +
  scale_fill_viridis_c() +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

# 進階熱圖請見 Step 12,用 ComplexHeatmap 更專業

五、分面 (facet)──小多圖

library(ggplot2)

# facet_wrap:按一個變數分面,自動換列
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
  geom_point() +
  facet_wrap(~ Species)

# facet_wrap 控制列數
ggplot(iris, aes(Sepal.Length, Sepal.Width)) +
  geom_point() +
  facet_wrap(~ Species, nrow = 1)

# facet_grid:行 × 列 兩變數
mtcars$cyl <- factor(mtcars$cyl)
mtcars$am  <- factor(mtcars$am, labels = c("auto","manual"))
ggplot(mtcars, aes(wt, mpg)) + geom_point() +
  facet_grid(am ~ cyl)

# 自由縮放 scales / Free scales
ggplot(iris, aes(Sepal.Length, Sepal.Width)) + geom_point() +
  facet_wrap(~ Species, scales = "free")     # 各分面獨立座標軸

六、主題與精修

theme風格
theme_grey()預設灰底
theme_classic()出版常用,僅 x/y 軸線
theme_minimal()極簡,只留淡格線
theme_bw()白底黑邊
theme_void()完全沒有座標軸/格線
ggthemes / hrbrthemes / ggpubr第三方主題包
# 修主題的常見語法
p <- ggplot(iris, aes(Species, Sepal.Length, fill = Species)) +
       geom_boxplot()

p + theme_classic(base_size = 14) +     # 全圖字體基準大小
    theme(
      legend.position   = "top",
      legend.title      = element_blank(),
      axis.title.x      = element_blank(),
      axis.text.x       = element_text(angle = 45, hjust = 1, face = "italic"),
      panel.grid.major.y = element_line(color = "grey90"),
      plot.title        = element_text(face = "bold", hjust = 0.5)
    ) +
    labs(title = "Sepal length by species", y = "Sepal length (cm)")

# 自訂配色 / Custom colors
p + scale_fill_manual(values = c("setosa"="#1f4e8c","versicolor"="#d4a03c","virginica"="#c0392b"))
p + scale_fill_brewer(palette = "Set2")
p + scale_fill_viridis_d(option = "C")    # 連續變數用 _c, 離散用 _d
🎨
配色原則:
  • 避免紅綠──色盲人口約 8%,會分不出。
  • 離散變數用 scale_*_brewer(palette="Set2") 或 viridis discrete。
  • 連續變數用 viridis (scale_*_viridis_c()) 或 ColorBrewer 序列型 ("Blues", "Reds")。
  • 有方向性(如 logFC)用發散型調色 (RdBu, PRGn) 並把白色定在 0。
Color rules:
  • Avoid red/green — ~8% of people are color-blind.
  • Discrete data: scale_*_brewer(palette="Set2") or viridis discrete.
  • Continuous data: viridis (scale_*_viridis_c()) or ColorBrewer sequential ("Blues", "Reds").
  • Directional data (e.g. logFC): use diverging palettes (RdBu, PRGn) with white at zero.

七、ggsave──輸出出版品級圖

p <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) +
       geom_point(size = 2) +
       theme_classic()

# PNG(給 PowerPoint / 預覽)
ggsave("results/figures/scatter.png",  plot = p,
       width = 6, height = 4, dpi = 300, units = "in")

# PDF(給 paper / Illustrator 編輯,向量、可放大不糊)
ggsave("results/figures/scatter.pdf",  plot = p, width = 6, height = 4)

# SVG(網頁、向量、檔案小)
ggsave("results/figures/scatter.svg",  plot = p, width = 6, height = 4)

# TIFF + LZW 壓縮(很多期刊指定)
ggsave("results/figures/scatter.tiff", plot = p,
       width = 7, height = 5, dpi = 600, compression = "lzw")

# 一次存多個尺寸(給縮圖 + 大圖)
for (w in c(3, 6, 12)) {
  ggsave(sprintf("results/figures/scatter_w%d.png", w),
         plot = p, width = w, height = w * 0.7, dpi = 300)
}

八、patchwork:用 + / 排版多圖

library(ggplot2); library(patchwork)
p1 <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point()
p2 <- ggplot(iris, aes(Petal.Length, Petal.Width, color = Species)) + geom_point()
p3 <- ggplot(iris, aes(Species, Sepal.Length, fill = Species)) + geom_boxplot()

# 並排 / Side by side
p1 + p2

# 上下 / Stacked
p1 / p2

# 複雜佈局 / Complex
(p1 + p2) / p3

# 加標題、abc 子圖編號
((p1 + p2) / p3) +
  plot_annotation(title = "Iris dataset", tag_levels = "A")

# 收集 legend / Combine legends
(p1 + p2) + plot_layout(guides = "collect") &
  theme(legend.position = "bottom")

# ggsave 直接存組合圖
ggsave("results/figures/combined.png",
       plot = (p1 + p2) / p3, width = 9, height = 6, dpi = 300)

📝 自我檢測

1. geom_point(aes(color = "blue"))geom_point(color = "blue") 的差別?

1. Difference between geom_point(aes(color = "blue")) and geom_point(color = "blue")?

A. 完全一樣A. Identical
B. 第一個語法錯誤B. First is a syntax error
C. 第一個會把 "blue" 當資料、出現名為 "blue" 的圖例;第二個才是真的藍色C. First treats "blue" as data and shows a legend called "blue"; second sets actual blue
D. 第一個比較快D. First is faster

2. 想把火山圖(log2FC vs -log10 padj)裡的顯著基因標上基因名且自動避免重疊,應使用?

2. To label significant genes on a volcano plot without overlap — use?

A. geom_text()A. geom_text()
B. annotate("text", ...)B. annotate("text", ...)
C. ggrepel::geom_text_repel()C. ggrepel::geom_text_repel()
D. geom_label()D. geom_label()

3. 想把連續型 log2 fold-change 用「以 0 為中心、紅藍發散」的調色,最佳選擇?

3. For continuous log2FC with "red-blue diverging from 0" palette, best choice?

A. scale_color_viridis_c()A. scale_color_viridis_c()
B. scale_color_brewer("Blues")B. scale_color_brewer("Blues")
C. scale_color_manual(values = c("red","blue"))C. scale_color_manual(values = c("red","blue"))
D. scale_color_distiller(palette = "RdBu", limits = c(-3, 3))D. scale_color_distiller(palette = "RdBu", limits = c(-3, 3))