STEP 4 / 9

縮放 (Scaling)

對基因進行 Z-score 標準化,讓每個基因在 PCA 中擁有平等的「發言權」。

Z-score normalization gives every gene equal "voice" in PCA analysis.

Scaling 在做什麼?

經過 Normalization 後,不同基因的表達值範圍仍然差異極大——某些高表達結構蛋白基因可能是低表達轉錄因子的數百倍。如果直接送入 PCA,主成分將被高表達基因主導。

Scaling 的解法:對每個基因執行 Z-score 標準化——減去平均值、除以標準差。這樣每個基因均值為 0、方差為 1,不論原始表達高低,都在同一個尺度上競爭。

After normalization, different genes still have vastly different expression ranges — some structural genes may be hundreds of times higher than transcription factors. Feeding this directly into PCA would let high-expression genes dominate.

Scaling solution: Z-score each gene — subtract mean, divide by std. This makes every gene mean=0, variance=1, putting all genes on equal footing regardless of original expression level.

📐 公式

z = (x − μ) / σ

x = 某細胞某基因的表達值
μ = 該基因在所有細胞的平均值
σ = 該基因在所有細胞的標準差

x = expression value of gene in a cell
μ = mean of that gene across all cells
σ = standard deviation across all cells

⚠️ 差異

Normalization 是「跨基因」操作——校正每個細胞的定序深度。
Scaling 是「跨細胞」操作——讓每個基因在相近尺度上。
兩者互補,不可混淆。

Normalization works "across genes" — corrects each cell's sequencing depth.
Scaling works "across cells" — equalizes each gene's scale.
They are complementary, not interchangeable.

Scaling 前後

回歸技術變數

在 Scaling 步驟中可額外「回歸掉」不想要的變異來源,如粒線體比例、細胞週期分數、定序批次。

During Scaling, you can optionally "regress out" unwanted variation sources: MT%, cell cycle scores, sequencing batch.

⚠️
謹慎使用。過度回歸可能移除真正的生物學訊號。例如研究細胞週期就不應回歸掉細胞週期分數。只回歸確定是純技術噪音的變數。Use with caution. Over-regression may remove real biology. Don't regress cell cycle if studying it. Only regress variables confirmed as purely technical noise.

實作範例

# 基本 Scaling
pbmc <- ScaleData(pbmc)
# 進階:回歸 MT% 和細胞週期
pbmc <- ScaleData(pbmc, vars.to.regress = c("percent.mt", "S.Score", "G2M.Score"))
# 注意:若使用 SCTransform,此步驟已自動完成
sc.pp.scale(adata, max_value=10)
# 進階:回歸
sc.pp.regress_out(adata, ["pct_counts_mt"])
sc.pp.scale(adata, max_value=10)
💡
SCTransform 使用者注意:SCTransform 的輸出已等同完成 Normalization + Feature Selection + Scaling,可直接跳到 PCA。SCTransform users: SCTransform output already includes Normalization + Feature Selection + Scaling. Skip directly to PCA.