Step 15: Visium HD & Histology — Spatial Transcriptomics Tutorial

Visium HD

一、Visium HD 的 bin 概念

Visium HD 用 2 µm × 2 µm 的 oligonucleotide squares 連續鋪滿整片 chip——比經典 Visium 的 55 µm spot 約小 750 倍。但 2 µm 一格的單獨表達非常稀疏，因此 Space Ranger 同時輸出多個 bin 大小：

2 µm：原始 bin。極稀疏，只在 BANKSY/segmentation 重新聚合時用。
8 µm：10x 推薦預設分析單位。雖非真正單細胞，但能準確定位 scRNA 細胞類型。
16 µm：訊號比較飽滿，適合先建立區塊概念。

Seurat v5 / spatialdata-io 都可以把多個 bin 同時載入成多個 assay，可在不同分析步驟切換。

Visium HD covers the chip with continuous 2 µm × 2 µm oligonucleotide squares — about 750× smaller than the classic Visium 55 µm spot. Single 2 µm bins are very sparse, so Space Ranger emits multiple bin sizes:

2 µm: raw bin. Very sparse — only used when re-aggregating with BANKSY / segmentation.
8 µm: 10x's recommended default. Not truly single-cell, but accurately localizes scRNA cell types.
16 µm: more saturated signal — good for establishing region structure first.

Seurat v5 / spatialdata-io can load multiple bins simultaneously as multi-assay objects to switch between analysis steps.

HD 工作流程

二、HD 流程關鍵步驟

Sketching

HD 動輒 1–2 百萬 bin，記憶體吃不消。用 Seurat SketchData 抽 50 k bins 跑分群、再 project 回完整資料。

HD often produces 1–2 M bins. Use Seurat SketchData to subsample ~50 k bins, cluster, then project back.

BANKSY clustering

2/8 µm bin 的 noise 高，BANKSY 的鄰域平均能顯著穩定 cluster；Seurat v5 已內建。

Noise at 2/8 µm is high; BANKSY's neighborhood averaging stabilizes clusters dramatically. Built into Seurat v5.

cell-binning

用 Cellpose 對 H&E 做 segmentation，再把屬於同一細胞的 2 µm bin 加總，得到「真正單細胞」AnnData。

Run Cellpose on H&E, then sum 2 µm bins inside each segmented cell to get a truly single-cell AnnData.

跨樣本整合

JEFworks 2025 教學示範：HD 多樣本用 sketching + Harmony 是當前最實際方案。

Per the JEFworks 2025 guide: sketching + Harmony is currently the most practical recipe for multi-sample HD.

Histology 整合

三、H&E + 表達雙模態

Visium / Visium HD / Xenium 都自帶 H&E 或 DAPI 影像，但傳統流程只把影像當「視覺對位工具」。2024–2025 出現一系列視覺 foundation model，把 H&E 形態跟基因表達在同一個 latent space 訓練：

HEST-1k (NeurIPS 2024)：1 229 個 ST sample + WSI 配對的開源資料集，是訓練/評估 vision foundation model 的核心 benchmark。
Vision-omics foundation model (Nat Methods 2025)：聯合預訓練 H&E patch + spot expression，可從 H&E 預測表達、或用表達輔助組織分類。
Thor (Nat Comm 2025)：把 ST 提升到 cell-level，整合形態學特徵做下游分析。
SpaGCN（已在 Step 6 介紹）：較早期把 H&E 顏色直接當 GNN 節點特徵。

典型用途：

用 H&E 對「未測 ST 的切片」預測 spatial expression（節省經費）
把 ST cluster 解釋成形態學定義的組織分類（病理對照）
從 H&E 找出「形態相似但表達不同」的隱藏亞群

Visium / Visium HD / Xenium ship with H&E or DAPI imaging, but classic pipelines treat the image only as a "visual alignment tool." 2024–2025 saw a wave of vision foundation models that train H&E morphology and gene expression in a shared latent space:

HEST-1k (NeurIPS 2024): 1 229 paired ST + WSI samples — the core open benchmark for training / evaluating vision-omics models.
Vision-omics foundation model (Nat Methods 2025): jointly pretrained on H&E patches + spot expression; can predict expression from H&E or use expression to refine tissue typing.
Thor (Nat Comm 2025): lifts ST to cell-level by integrating morphological features for downstream tasks.
SpaGCN (Step 6): early use of H&E color as direct node features in a GNN.

Typical applications:

Predict spatial expression for un-profiled sections from H&E (saves cost)
Re-interpret ST clusters via morphology-defined tissue classes (pathology cross-check)
Discover hidden subpopulations that look the same morphologically but differ in expression

互動模擬

互動：bin 大小 vs 訊噪

下圖：底色是模擬的解剖區塊（ground truth），方格是不同 bin 大小的「該基因是否被偵測」。觀察 2 µm 太稀疏、16 µm 又太粗。

Background = ground-truth anatomy; squares = "gene detected at this bin." 2 µm is too sparse; 16 µm is too coarse.

程式碼

實作

# 載入 Visium HD 多 bin
hd <- Load10X_Spatial("hd_out/binned_outputs/", bin.size = c(8, 16))

# Sketching → 抽 50 k bins 跑 PCA + cluster
DefaultAssay(hd) <- "Spatial.008um"
hd <- NormalizeData(hd) |> FindVariableFeatures() |> ScaleData()
hd <- SketchData(hd, ncells = 50000, method = "LeverageScore", sketched.assay = "sketch")
DefaultAssay(hd) <- "sketch"
hd <- RunPCA(hd) |> FindNeighbors() |> FindClusters(resolution = 0.5) |> RunUMAP(dims = 1:30)

# Project 回 full bins
hd <- ProjectData(hd, sketched.assay = "sketch", assay = "Spatial.008um",
                  full.reduction = "full.pca", dims = 1:30)
SpatialDimPlot(hd, label = TRUE)

# spatialdata-io 載入 HD
import spatialdata_io as sd_io
sdata = sd_io.visium_hd("hd_out/", bin_size=[8, 16])

# HEST dataset (HuggingFace) - 用於 vision-omics foundation model
from hest import HESTReader
reader = HESTReader(); st_obj = reader.read("TENX95")  # 一個 sample

# 範例：用 H&E patch 訓練 / 推論 expression
import torch
patch = extract_patch(st_obj.wsi, x=2400, y=1700, size=224)
expr  = vision_omics_model(torch.tensor(patch))

📝 自我檢測

1. 10x Genomics 對於 Visium HD 推薦的預設 bin 大小？

1. 10x Genomics' recommended default bin for Visium HD?

A. 2 µmA. 2 µm

B. 8 µmB. 8 µm

C. 16 µmC. 16 µm

D. 55 µmD. 55 µm

2. 對於記憶體吃不消的 HD 大資料，什麼策略最實用？

2. Most practical strategy for memory-intensive HD data?

A. Sketching：抽樣 50 k bins 跑分析、再 project 回去A. Sketching: subsample 50k bins, then project back

B. 一次載入全部B. Load everything at once

C. 拒絕分析C. Refuse to analyze

D. 隨機刪除 90% binD. Randomly delete 90% of bins

3. HEST-1k 的角色是？

3. Role of HEST-1k?

A. 是新的 segmentation 工具A. A new segmentation tool

B. 是 BANKSY 的替代品B. A BANKSY alternative

C. 1229 個 ST + WSI 配對的開源資料集，用來訓練 vision-omics foundation modelC. Open dataset of 1 229 paired ST + WSI samples for training vision-omics foundation models

D. 全新的定序平台D. A new sequencing platform