PCA 的角色
即使只使用 2000 個 HVGs,數據仍然是 2000 維的——這對聚類和視覺化來說太高了(「維度災難」)。PCA 找出數據中方差最大的方向(主成分),用前 N 個主成分就能捕捉絕大部分的真實訊號,而後面的主成分多半是噪音。
Even with just 2000 HVGs, the data is still 2000-dimensional — too high for clustering and visualization ("curse of dimensionality"). PCA finds directions of maximum variance (principal components); the first N PCs capture most true signal while later PCs are mostly noise.
降維
從 ~2000 維壓縮到 10–50 維。
Compress from ~2000 to 10–50 dimensions.
去噪
丟棄低方差主成分等同過濾噪音。
Discarding low-variance PCs effectively filters noise.
加速
降維後聚類、UMAP 計算大幅提升。
Dramatically speeds up clustering and UMAP computation.
Elbow Plot
Elbow Plot 顯示每個主成分解釋的方差比例。找到「手肘」——曲線開始趨平的位置。拖動滑桿設定選擇。
Elbow Plot shows variance explained by each PC. Find the "elbow" — where the curve flattens. Drag slider to set your choice.
🌳 如何決定 nPCs?
實作範例
pbmc <- RunPCA(pbmc, features = VariableFeatures(pbmc)) DimPlot(pbmc, reduction = "pca") ElbowPlot(pbmc, ndims = 50) # JackStraw(可選,較慢) pbmc <- JackStraw(pbmc, num.replicate = 100) pbmc <- ScoreJackStraw(pbmc, dims = 1:30) JackStrawPlot(pbmc, dims = 1:30)
sc.tl.pca(adata, n_comps=50) sc.pl.pca(adata) sc.pl.pca_variance_ratio(adata, n_pcs=50)
📝 自我檢測
PCA 主要壓縮哪個方向的維度?
PCA primarily compresses which dimension?