一、Visium HD 的 bin 概念
Visium HD 用 2 µm × 2 µm 的 oligonucleotide squares 連續鋪滿整片 chip——比經典 Visium 的 55 µm spot 約小 750 倍。但 2 µm 一格的單獨表達非常稀疏,因此 Space Ranger 同時輸出多個 bin 大小:
- 2 µm:原始 bin。極稀疏,只在 BANKSY/segmentation 重新聚合時用。
- 8 µm:10x 推薦預設分析單位。雖非真正單細胞,但能準確定位 scRNA 細胞類型。
- 16 µm:訊號比較飽滿,適合先建立區塊概念。
Seurat v5 / spatialdata-io 都可以把多個 bin 同時載入成多個 assay,可在不同分析步驟切換。
Visium HD covers the chip with continuous 2 µm × 2 µm oligonucleotide squares — about 750× smaller than the classic Visium 55 µm spot. Single 2 µm bins are very sparse, so Space Ranger emits multiple bin sizes:
- 2 µm: raw bin. Very sparse — only used when re-aggregating with BANKSY / segmentation.
- 8 µm: 10x's recommended default. Not truly single-cell, but accurately localizes scRNA cell types.
- 16 µm: more saturated signal — good for establishing region structure first.
Seurat v5 / spatialdata-io can load multiple bins simultaneously as multi-assay objects to switch between analysis steps.
二、HD 流程關鍵步驟
Sketching
HD 動輒 1–2 百萬 bin,記憶體吃不消。用 Seurat SketchData 抽 50 k bins 跑分群、再 project 回完整資料。
HD often produces 1–2 M bins. Use Seurat SketchData to subsample ~50 k bins, cluster, then project back.
BANKSY clustering
2/8 µm bin 的 noise 高,BANKSY 的鄰域平均能顯著穩定 cluster;Seurat v5 已內建。
Noise at 2/8 µm is high; BANKSY's neighborhood averaging stabilizes clusters dramatically. Built into Seurat v5.
cell-binning
用 Cellpose 對 H&E 做 segmentation,再把屬於同一細胞的 2 µm bin 加總,得到「真正單細胞」AnnData。
Run Cellpose on H&E, then sum 2 µm bins inside each segmented cell to get a truly single-cell AnnData.
跨樣本整合
JEFworks 2025 教學示範:HD 多樣本用 sketching + Harmony 是當前最實際方案。
Per the JEFworks 2025 guide: sketching + Harmony is currently the most practical recipe for multi-sample HD.
三、H&E + 表達雙模態
Visium / Visium HD / Xenium 都自帶 H&E 或 DAPI 影像,但傳統流程只把影像當「視覺對位工具」。2024–2025 出現一系列視覺 foundation model,把 H&E 形態跟基因表達在同一個 latent space 訓練:
- HEST-1k (NeurIPS 2024):1 229 個 ST sample + WSI 配對的開源資料集,是訓練/評估 vision foundation model 的核心 benchmark。
- Vision-omics foundation model (Nat Methods 2025):聯合預訓練 H&E patch + spot expression,可從 H&E 預測表達、或用表達輔助組織分類。
- Thor (Nat Comm 2025):把 ST 提升到 cell-level,整合形態學特徵做下游分析。
- SpaGCN(已在 Step 6 介紹):較早期把 H&E 顏色直接當 GNN 節點特徵。
典型用途:
- 用 H&E 對「未測 ST 的切片」預測 spatial expression(節省經費)
- 把 ST cluster 解釋成形態學定義的組織分類(病理對照)
- 從 H&E 找出「形態相似但表達不同」的隱藏亞群
Visium / Visium HD / Xenium ship with H&E or DAPI imaging, but classic pipelines treat the image only as a "visual alignment tool." 2024–2025 saw a wave of vision foundation models that train H&E morphology and gene expression in a shared latent space:
- HEST-1k (NeurIPS 2024): 1 229 paired ST + WSI samples — the core open benchmark for training / evaluating vision-omics models.
- Vision-omics foundation model (Nat Methods 2025): jointly pretrained on H&E patches + spot expression; can predict expression from H&E or use expression to refine tissue typing.
- Thor (Nat Comm 2025): lifts ST to cell-level by integrating morphological features for downstream tasks.
- SpaGCN (Step 6): early use of H&E color as direct node features in a GNN.
Typical applications:
- Predict spatial expression for un-profiled sections from H&E (saves cost)
- Re-interpret ST clusters via morphology-defined tissue classes (pathology cross-check)
- Discover hidden subpopulations that look the same morphologically but differ in expression
互動:bin 大小 vs 訊噪
下圖:底色是模擬的解剖區塊(ground truth),方格是不同 bin 大小的「該基因是否被偵測」。觀察 2 µm 太稀疏、16 µm 又太粗。
Background = ground-truth anatomy; squares = "gene detected at this bin." 2 µm is too sparse; 16 µm is too coarse.
實作
# 載入 Visium HD 多 bin hd <- Load10X_Spatial("hd_out/binned_outputs/", bin.size = c(8, 16)) # Sketching → 抽 50 k bins 跑 PCA + cluster DefaultAssay(hd) <- "Spatial.008um" hd <- NormalizeData(hd) |> FindVariableFeatures() |> ScaleData() hd <- SketchData(hd, ncells = 50000, method = "LeverageScore", sketched.assay = "sketch") DefaultAssay(hd) <- "sketch" hd <- RunPCA(hd) |> FindNeighbors() |> FindClusters(resolution = 0.5) |> RunUMAP(dims = 1:30) # Project 回 full bins hd <- ProjectData(hd, sketched.assay = "sketch", assay = "Spatial.008um", full.reduction = "full.pca", dims = 1:30) SpatialDimPlot(hd, label = TRUE)
# spatialdata-io 載入 HD import spatialdata_io as sd_io sdata = sd_io.visium_hd("hd_out/", bin_size=[8, 16]) # HEST dataset (HuggingFace) - 用於 vision-omics foundation model from hest import HESTReader reader = HESTReader(); st_obj = reader.read("TENX95") # 一個 sample # 範例:用 H&E patch 訓練 / 推論 expression import torch patch = extract_patch(st_obj.wsi, x=2400, y=1700, size=224) expr = vision_omics_model(torch.tensor(patch))
📝 自我檢測
1. 10x Genomics 對於 Visium HD 推薦的預設 bin 大小?
1. 10x Genomics' recommended default bin for Visium HD?
2. 對於記憶體吃不消的 HD 大資料,什麼策略最實用?
2. Most practical strategy for memory-intensive HD data?
3. HEST-1k 的角色是?
3. Role of HEST-1k?