🐕

【シングルセル解析】Seurat v5のLayerとは

2023/11/08に公開

bioinformatics

seurat

scrnaseq

tech

【はじめに】

Seurat v5は超巨大なデータをメモリにロードすることなくディスクに置いたままアクセスできるようになったことや、Integrationが1行でできるようになったり様々な更新が行われている。Seuratオブジェクトの構造でv5から新たに実装されたLayerについて紹介する。

23年11月現在、CRANからSeuratをインストールするとversionが5がインストールされるので今後はLayerを使うのがスタンダードになってくると思われる。

install.packages("Seurat")
library(Seurat)

packageVersion(pkg = "Seurat")

デモデータ

PBMCの2つの実験条件（IFNγの刺激有無）のシングルセルデータをデモに使用する。

デモデータの用意

devtools::install_github('satijalab/seurat-data')
library(SeuratData)
InstallData("ifnb")

初めて使う場合はダウンロードが始まる。

ダウンロード画面

デモデータの読み込み

sc <- LoadData(ds = "ifnb")
sc <- ScaleData(sc)

14053遺伝子 × 13999細胞のデータ。2 layers presentsと表示されている。

【Layerとは】

Seuratは1つのオブジェクトで複数のassay（RNAやSCT、Integrated、ADTなど）を同時に保持することができる。これはSeuratObject@assaysで管理されている。
各assay（例えばSeuratObject@assays$RNA）には発現マトリクス（counts）や正規化した発現マトリクス（data）、zスコア化した発現マトリクス（scale.data）、細胞バーコード、遺伝子ベクトルなどの情報がある。v5からはこの内のcounts/data/scale.dataをlayersという階層で管理する。（SeuratObject@assays$RNA@layers$countsやSeuratObject@assays$RNA@layers$data）

デモデータのデータ構造

Layerからのデータ取得

次のコマンドはいづれもRNA assayから発現マトリクスを取り出すコマンドである。GetAssayData()機能ではslot=引数を使うと警告が出るようになり、代わりにlayer=引数を使うように促される。LayerData()機能が新たに実装された。

sc@assays$RNA$counts

sc[["RNA"]]$counts

GetAssayData(object = sc, assay = "RNA", layer = "counts")

LayerData(object = sc, assay = "RNA", layer = "counts")

Layerの発現マトリクスを上書き

次のコマンドはいづれもRNA assayの発現マトリクスをnew.dataで上書きするコマンドである。LayerData()を使った上書きもできる。

sc[["RNA"]]$counts <- new.data

LayerData(sc, assay = "RNA", layer = "counts") <- new.data

sc <- SetAssayData(object = sc, layer = "counts", new.data = new.data)

【Seuratオブジェクトのmerge】

2つ以上のSeuratオブジェクトをmergeした場合のデータ構造がv4とは異なっているので注意が必要だ。

デモとして2つのSeuratオブジェクトをmergeしてみる。

# デモ用にSeuratオブジェクトを2つに分ける
sc_list <- SplitObject(object = sc, split.by = "stim")

# 2つのSeuratオブジェクトをmerge
sc_merge <- merge(sc_list[[1]], sc_list[[2]])

layersに2つのcounts、2つのdataが現れるようになる。

mergeしたSeuratオブジェクトのデータ構造

counts.〇、data.〇のように「.〇」の部分で元のデータが区別されるが、merge時の名前はProject名が使用される。Project名が同じ場合は連番になるようだ。

sc_list <- SplitObject(object = sc, split.by = "stim")
Project(sc_list[[1]]) <- "Ctr"
Project(sc_list[[2]]) <- "Treat"
sc_merge <- merge(sc_list[[1]], sc_list[[2]])

もし、1つの発現マトリクスに統合したければ、JoinLayers()機能を使用する。

sc_merge <- JoinLayers(object = sc_merge)

デフォルトではcountsもdataもそれぞれ1つずつに統合されるが、layers=countsやlayers=dataのように統合したいlayerだけを指定することもできる。

【Layerのsplit】

1つの発現マトリクスを実験条件ごとに分けるにはsplit()機能を使用する。f=引数でどの細胞がどの実験群かを示すベクトルを渡す。ここではmeta.dataスロットのstim列を指定した。

sc2 <- sc_merge # データ比較用にsc2にコピーして進める
sc2@assays$RNA <- split(x = sc2@assays$RNA, f = sc2@meta.data$stim)

デフォルトではcountsもdataもsplitされるが、layers="counts"やlayers="data"と入れるとcountsだけ、dataだけがsplitされる。

ちなみにmerge後のSeuratオブジェクトをJoinLayrs()せずに更なるsplitをしようとするとエラーになる。

Layer名

split()を使うと、「counts.群名」、「data.群名」のような名前が付く。

split後のデータ構造

layer名はオブジェクト名を打つか、Layers()機能を使って確認することもできる。

layer名の確認

splitした後にLayerData()やGetAssayData()のようにlayer=引数がある場合は正確に指定する必要がある。

layer="counts"とするとsplitされた発現マトリクスのうち、1つ目が返される。

LayerData(object = sc2, assay = "RNA", layer = "counts")

次のように正確に記載する必要がある。

LayerData(object = sc2, assay = "RNA", layer = "counts.STIM")

【Layerを使ったIntegration】

v4でIntegrationするには、異なる実験条件のSeuratオブジェクトをそれぞれ異なるオブジェクトとして用意する必要があった。上述の通り、v5からは1つのSeuratオブジェクトで複数の発現マトリクスをlayersで分けて管理することができるようになり、IntegrationもIntegrateLayers()の1コマンドで行えるようになった。

Integration方法も5種類用意されている。

https://satijalab.org/seurat/articles/seurat5_integration

Integration前の処理

基本的な流れはv4の時と変わらない。PCAまで進めておく。

sc2 <- NormalizeData(sc2)
sc2 <- FindVariableFeatures(sc2)
sc2 <- ScaleData(sc2)
sc2 <- RunPCA(sc2)

Integration前のPCAデータで次元削減を行った場合を見ておくと、実験条件による差が出ているのがわかる。

sc2 <- RunUMAP(sc2, dims = 1:30)
DimPlot(sc2, group.by = "stim")

Integration - CCA -

method=CCAIntegrationと指定する。

sc2 <- IntegrateLayers(
  object = sc2, method = CCAIntegration,
  orig.reduction = "pca", new.reduction = "integrated.cca",
  verbose = FALSE
)

1分20秒ほどで終了。

sc2 <- RunUMAP(sc2, dims = 1:30,
               reduction="integrated.cca", reduction.name = "umap_cca")
DimPlot(sc2, group.by = "stim", reduction = "umap_cca")

Integration - RPCA -

method=RPCAIntegrationと指定する。

sc2 <- IntegrateLayers(
  object = sc2, method = RPCAIntegration,
  orig.reduction = "pca", new.reduction = "integrated.rpca",
  verbose = FALSE
)

3秒で終了。

sc2 <- RunUMAP(sc2, dims = 1:30,
               reduction="integrated.rpca", reduction.name = "umap_rpca")
DimPlot(sc2, group.by = "stim", reduction = "umap_rpca")

Integration - Harmony -

method=HarmonyIntegrationと指定する。harmonyパッケージが無い場合はインストールするか聞いてくるので言われるとおりにインストールする。

sc2 <- IntegrateLayers(
  object = sc2, method = HarmonyIntegration,
  orig.reduction = "pca", new.reduction = "harmony",
  verbose = FALSE
)

1分程度で終了。

sc2 <- RunUMAP(sc2, dims = 1:30,
               reduction="harmony", reduction.name = "umap_harmony")
DimPlot(sc2, group.by = "stim", reduction = "umap_harmony")

Integration - FastMNN -

SeuratWrappersパッケージが必要となる。さらに初めてFastMNNIntegrationを使うときはbatchelorをインストールするように聞いてくるので言われるままインストールする。

remotes::install_github('satijalab/seurat-wrappers')
library(SeuratWrappers)

Vignetted通りにmethod = FastMNNIntegrationを行うもエラーが出た。

sc2 <- IntegrateLayers(
  object = sc2, method = FastMNNIntegration,
  orig.reduction = "pca", new.reduction = "integrated.mnn",
  verbose = FALSE
)

対処法はわからず。。。

Integration - scVI -

method=scVIIntegrationを使うには、別途python環境を用意する必要がある。

https://docs.scvi-tools.org/en/stable/installation.html

統合処理後は発現マトリクスをsplitしたままにする必要はないので、JoinLayers()をしておくとのこと。

https://satijalab.org/seurat/articles/seurat5_integration

【SCTransformの場合のIntegration】

デモ用にsplitしたSeuratオブジェクトを用意する。

sc_list <- SplitObject(object = sc, split.by = "stim")
sc_merge <- merge(sc_list[[1]], sc_list[[2]])
sc_merge <- JoinLayers(object = sc_merge)
sc3 <- sc_merge
sc3@assays$RNA <- split(x = sc3@assays$RNA, f = sc3@meta.data$stim)

splitされた状態でSCTransform()を実行する。

# SCTransformで前処理
sc3 <- SCTransform(sc3)

どうやらSCTransformはversion2がデフォルトになったようだ。layerがsplitしている場合は、実験群の数だけSCTransformが行われる。ここではcounts.CTRL、counts.STIMに対して実行される。

SCTransform()の実行画面

SCT assayにはlayersは無いみたい。countsやdataは1つだけである。（SCTModel.listという場所に実験群ごとのSCTModelが保存されている。）

SCTransform後のデータ構造

SCTransformでintegration無しの場合の次元削減を見ておく。

sc3 <- RunPCA(sc3)
sc3 <- RunUMAP(sc3, dims = 1:30)
DimPlot(sc3, group.by = "stim")

Integration無しでの次元削減

※ Integrationの際には`normalization.method = "SCT"`を付ける

CCA

sc3 <- IntegrateLayers(
  object = sc3, method = CCAIntegration, 
  normalization.method = "SCT",
  orig.reduction = "pca", 
  new.reduction = "integrated.cca",
  verbose = FALSE
)
sc3 <- RunUMAP(sc3, dims = 1:30,
               reduction="integrated.cca", reduction.name = "umap_cca")
DimPlot(sc3, group.by = "stim", reduction = "umap_cca")

RPCA

sc3 <- IntegrateLayers(
  object = sc3, method = RPCAIntegration,
  normalization.method = "SCT",
  orig.reduction = "pca", new.reduction = "integrated.rpca",
  verbose = FALSE
)
sc3 <- RunUMAP(sc3, dims = 1:30,
               reduction="integrated.rpca", reduction.name = "umap_rpca")
DimPlot(sc3, group.by = "stim", reduction = "umap_rpca")

Harmony

sc3 <- IntegrateLayers(
  object = sc3, method = HarmonyIntegration,
  normalization.method = "SCT",
  orig.reduction = "pca", new.reduction = "harmony",
  verbose = FALSE
)
sc3 <- RunUMAP(sc3, dims = 1:30,
               reduction="harmony", reduction.name = "umap_harmony")
DimPlot(sc3, group.by = "stim", reduction = "umap_harmony")

SCT assayでJoinLayers()

DefaultAssay()がSCTの状態で、JoinLayers()を行っても次のエラーが出る。上述の通り、SCT assayにはlayersが無いためである。

UseMethod(generic = "JoinLayers", object = object) でエラー:
'JoinLayers' をクラス "c('SCTAssay', 'Assay', 'KeyMixin')" のオブジェクトに適用できるようなメソッドがありません

その場合はDefaultAssayをRNAに変えるか、assay="RNA"オプションをつけるとよい。

sc3 <- JoinLayers(sc3, assay = "RNA")