Brings Seurat to the tidyverse!

website: stemangiola.github.io/tidyseurat/

Please also have a look at

tidySingleCellExperiment for tidy single-cell RNA sequencing analysis
tidySummarizedExperiment for tidy bulk RNA sequencing analysis
tidybulk for tidy bulk RNA-seq analysis
nanny for tidy high-level data analysis and manipulation
tidygate for adding custom gate information to your tibble
tidyHeatmap for heatmaps produced with tidy principles

visual cue

Introduction

tidyseurat provides a bridge between the Seurat single-cell package [@butler2018integrating; @stuart2019comprehensive] and the tidyverse [@wickham2019welcome]. It creates an invisible layer that enables viewing the Seurat object as a tidyverse tibble, and provides Seurat-compatible dplyr, tidyr, ggplot and plotly functions.

Functions/utilities available

Seurat-compatible Functions	Description
`all`	After all `tidyseurat` is a Seurat object, just better

tidyverse Packages	Description
`dplyr`	All `dplyr` APIs like for any tibble
`tidyr`	All `tidyr` APIs like for any tibble
`ggplot2`	`ggplot` like for any tibble
`plotly`	`plot_ly` like for any tibble

Utilities	Description
`tidy`	Add `tidyseurat` invisible layer over a Seurat object
`as_tibble`	Convert cell-wise information to a `tbl_df`
`join_features`	Add feature-wise information, returns a `tbl_df`

Installation

From CRAN

install.packages("tidyseurat")

From Github (development)

devtools::install_github("stemangiola/tidyseurat")

library(dplyr)
library(tidyr)
library(purrr)
library(magrittr)
library(ggplot2)
library(Seurat)
library(tidyseurat)

Create `tidyseurat`, the best of both worlds!

This is a seurat object but it is evaluated as tibble. So it is fully compatible both with Seurat and tidyverse APIs.

pbmc_small = SeuratObject::pbmc_small

It looks like a tibble

pbmc_small

## # A Seurat-tibble abstraction: 80 × 15
## # [90mFeatures=230 | Cells=80 | Active assay=RNA | Assays=RNA[0m
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # … with 70 more rows, and 8 more variables: RNA_snn_res.1 <fct>, PC_1 <dbl>,
## #   PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

But it is a Seurat object after all

pbmc_small@assays

## $RNA
## Assay data with 230 features for 80 cells
## Top 10 variable features:
##  PPBP, IGLL5, VDAC3, CD1C, AKR1C3, PF4, MYL9, GNLY, TREML1, CA2

Preliminary plots

Set colours and theme for plots.

# Use colourblind-friendly colours
friendly_cols <- c("#88CCEE", "#CC6677", "#DDCC77", "#117733", "#332288", "#AA4499", "#44AA99", "#999933", "#882255", "#661100", "#6699CC")

# Set theme
my_theme <-
  list(
    scale_fill_manual(values = friendly_cols),
    scale_color_manual(values = friendly_cols),
    theme_bw() +
      theme(
        panel.border = element_blank(),
        axis.line = element_line(),
        panel.grid.major = element_line(size = 0.2),
        panel.grid.minor = element_line(size = 0.1),
        text = element_text(size = 12),
        legend.position = "bottom",
        aspect.ratio = 1,
        strip.background = element_blank(),
        axis.title.x = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10)),
        axis.title.y = element_text(margin = margin(t = 10, r = 10, b = 10, l = 10))
      )
  )

We can treat pbmc_small effectively as a normal tibble for plotting.

Here we plot number of features per cell.

pbmc_small %>%
  tidyseurat::ggplot(aes(nFeature_RNA, fill = groups)) +
  geom_histogram() +
  my_theme

plot of chunk plot1

Here we plot total features per cell.

pbmc_small %>%
  tidyseurat::ggplot(aes(groups, nCount_RNA, fill = groups)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.1) +
  my_theme

plot of chunk plot2

Here we plot abundance of two features for each group.

pbmc_small %>%
  join_features(features = c("HLA-DRA", "LYZ")) %>%
  ggplot(aes(groups, .abundance_RNA + 1, fill = groups)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(aes(size = nCount_RNA), alpha = 0.5, width = 0.2) +
  scale_y_log10() +
  my_theme

plot of chunk unnamed-chunk-12

Preprocess the dataset

Also you can treat the object as Seurat object and proceed with data processing.

pbmc_small_pca <-
  pbmc_small %>%
  SCTransform(verbose = FALSE) %>%
  FindVariableFeatures(verbose = FALSE) %>%
  RunPCA(verbose = FALSE)

pbmc_small_pca

## # A Seurat-tibble abstraction: 80 × 17
## # [90mFeatures=220 | Cells=80 | Active assay=SCT | Assays=RNA, SCT[0m
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # … with 70 more rows, and 10 more variables: RNA_snn_res.1 <fct>,
## #   nCount_SCT <dbl>, nFeature_SCT <int>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>,
## #   PC_4 <dbl>, PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

If a tool is not included in the tidyseurat collection, we can use as_tibble to permanently convert tidyseurat into tibble.

pbmc_small_pca %>%
  as_tibble() %>%
  select(contains("PC"), everything()) %>%
  GGally::ggpairs(columns = 1:5, ggplot2::aes(colour = groups)) +
  my_theme

plot of chunk pc_plot

Identify clusters

We proceed with cluster identification with Seurat.

pbmc_small_cluster <-
  pbmc_small_pca %>%
  FindNeighbors(verbose = FALSE) %>%
  FindClusters(method = "igraph", verbose = FALSE)

pbmc_small_cluster

## # A Seurat-tibble abstraction: 80 × 19
## # [90mFeatures=220 | Cells=80 | Active assay=SCT | Assays=RNA, SCT[0m
##    .cell orig.ident nCount_RNA nFeature_RNA RNA_snn_res.0.8 letter.idents groups
##    <chr> <fct>           <dbl>        <int> <fct>           <fct>         <chr> 
##  1 ATGC… SeuratPro…         70           47 0               A             g2    
##  2 CATG… SeuratPro…         85           52 0               A             g1    
##  3 GAAC… SeuratPro…         87           50 1               B             g2    
##  4 TGAC… SeuratPro…        127           56 0               A             g2    
##  5 AGTC… SeuratPro…        173           53 0               A             g2    
##  6 TCTG… SeuratPro…         70           48 0               A             g1    
##  7 TGGT… SeuratPro…         64           36 0               A             g1    
##  8 GCAG… SeuratPro…         72           45 0               A             g1    
##  9 GATA… SeuratPro…         52           36 0               A             g1    
## 10 AATG… SeuratPro…        100           41 0               A             g1    
## # … with 70 more rows, and 12 more variables: RNA_snn_res.1 <fct>,
## #   nCount_SCT <dbl>, nFeature_SCT <int>, SCT_snn_res.0.8 <fct>,
## #   seurat_clusters <fct>, PC_1 <dbl>, PC_2 <dbl>, PC_3 <dbl>, PC_4 <dbl>,
## #   PC_5 <dbl>, tSNE_1 <dbl>, tSNE_2 <dbl>

Now we can interrogate the object as if it was a regular tibble data frame.

pbmc_small_cluster %>%
  tidyseurat::count(groups, seurat_clusters)

## # A tibble: 2 × 3
##   groups seurat_clusters     n
##   <chr>  <fct>           <int>
## 1 g1     0                  44
## 2 g2     0                  36

We can identify cluster markers using Seurat.

# Identify top 10 markers per cluster
markers <-
  pbmc_small_cluster %>%
  FindAllMarkers(only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25) %>%
  group_by(cluster) %>%
  top_n(10, avg_log2FC)

# Plot heatmap
pbmc_small_cluster %>%
  DoHeatmap(
    features = markers$gene,
    group.colors = friendly_cols
  )

Reduce dimensions

We can calculate the first 3 UMAP dimensions using the Seurat framework.

pbmc_small_UMAP <-
  pbmc_small_cluster %>%
  RunUMAP(reduction = "pca", dims = 1:15, n.components = 3L)

And we can plot them using 3D plot using plotly.

pbmc_small_UMAP %>%
  plot_ly(
    x = ~`UMAP_1`,
    y = ~`UMAP_2`,
    z = ~`UMAP_3`,
    color = ~seurat_clusters,
    colors = friendly_cols[1:4]
  )

screenshot plotly

Cell type prediction

We can infer cell type identities using SingleR [@aran2019reference] and manipulate the output using tidyverse.

# Get cell type reference data
blueprint <- celldex::BlueprintEncodeData()

# Infer cell identities
cell_type_df <-
  GetAssayData(pbmc_small_UMAP, slot = 'counts', assay = "SCT") %>%
  log1p() %>%
  Matrix::Matrix(sparse = TRUE) %>%
  SingleR::SingleR(
    ref = blueprint,
    labels = blueprint$label.main,
    method = "single"
  ) %>%
  as.data.frame() %>%
  as_tibble(rownames = "cell") %>%
  select(cell, first.labels)

# Join UMAP and cell type info
pbmc_small_cell_type <-
  pbmc_small_UMAP %>%
  left_join(cell_type_df, by = "cell")

# Reorder columns
pbmc_small_cell_type %>%
  tidyseurat::select(cell, first.labels, everything())

We can easily summarise the results. For example, we can see how cell type classification overlaps with cluster classification.

pbmc_small_cell_type %>%
  count(seurat_clusters, first.labels)

We can easily reshape the data for building information-rich faceted plots.

pbmc_small_cell_type %>%

  # Reshape and add classifier column
  pivot_longer(
    cols = c(seurat_clusters, first.labels),
    names_to = "classifier", values_to = "label"
  ) %>%

  # UMAP plots for cell type and cluster
  ggplot(aes(UMAP_1, UMAP_2, color = label)) +
  geom_point() +
  facet_wrap(~classifier) +
  my_theme

We can easily plot gene correlation per cell category, adding multi-layer annotations.

pbmc_small_cell_type %>%

  # Add some mitochondrial abundance values
  mutate(mitochondrial = rnorm(n())) %>%

  # Plot correlation
  join_features(features = c("CST3", "LYZ"), shape = "wide") %>%
  ggplot(aes(CST3 + 1, LYZ + 1, color = groups, size = mitochondrial)) +
  geom_point() +
  facet_wrap(~first.labels, scales = "free") +
  scale_x_log10() +
  scale_y_log10() +
  my_theme

Nested analyses

A powerful tool we can use with tidyseurat is nest. We can easily perform independent analyses on subsets of the dataset. First we classify cell types in lymphoid and myeloid; then, nest based on the new classification

pbmc_small_nested <-
  pbmc_small_cell_type %>%
  filter(first.labels != "Erythrocytes") %>%
  mutate(cell_class = if_else(`first.labels` %in% c("Macrophages", "Monocytes"), "myeloid", "lymphoid")) %>%
  nest(data = -cell_class)

pbmc_small_nested

Now we can independently for the lymphoid and myeloid subsets (i) find variable features, (ii) reduce dimensions, and (iii) cluster using both tidyverse and SingleCellExperiment seamlessly.

pbmc_small_nested_reanalysed <-
  pbmc_small_nested %>%
  mutate(data = map(
    data, ~ .x %>%
      FindVariableFeatures(verbose = FALSE) %>%
      RunPCA(npcs = 10, verbose = FALSE) %>%
      FindNeighbors(verbose = FALSE) %>%
      FindClusters(method = "igraph", verbose = FALSE) %>%
      RunUMAP(reduction = "pca", dims = 1:10, n.components = 3L, verbose = FALSE)
  ))

pbmc_small_nested_reanalysed

Now we can unnest and plot the new classification.

pbmc_small_nested_reanalysed %>%

  # Convert to tibble otherwise Seurat drops reduced dimensions when unifying data sets.
  mutate(data = map(data, ~ .x %>% as_tibble())) %>%
  unnest(data) %>%

  # Define unique clusters
  unite("cluster", c(cell_class, seurat_clusters), remove = FALSE) %>%

  # Plotting
  ggplot(aes(UMAP_1, UMAP_2, color = cluster)) +
  geom_point() +
  facet_wrap(~cell_class) +
  my_theme

Session Info

sessionInfo()

## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS:   /stornext/System/data/apps/R/R-4.1.0/lib64/R/lib/libRblas.so
## LAPACK: /stornext/Home/data/allstaff/m/mangiola.s/.conda/envs/scib-pipeline-R4/lib/libmkl_intel_lp64.so.2
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] tidyseurat_0.5.3   ttservice_0.1.2    sp_1.4-7           SeuratObject_4.1.0
##  [5] Seurat_4.1.1       ggplot2_3.3.6.9000 magrittr_2.0.3     purrr_0.3.4       
##  [9] tidyr_1.2.0        dplyr_1.0.9        knitr_1.39        
## 
## loaded via a namespace (and not attached):
##   [1] Rtsne_0.16            colorspace_2.0-3      deldir_1.0-6         
##   [4] ellipsis_0.3.2        ggridges_0.5.3        markdown_1.1         
##   [7] spatstat.data_2.2-0   rstudioapi_0.13       farver_2.1.0         
##  [10] leiden_0.3.10         listenv_0.8.0         rstan_2.26.6         
##  [13] ggrepel_0.9.1         fansi_1.0.3           codetools_0.2-18     
##  [16] splines_4.1.0         polyclip_1.10-0       jsonlite_1.8.0       
##  [19] ica_1.0-2             cluster_2.1.2         png_0.1-7            
##  [22] rgeos_0.5-9           uwot_0.1.11           spatstat.sparse_2.1-1
##  [25] sctransform_0.3.3     shiny_1.7.1           compiler_4.1.0       
##  [28] httr_1.4.2            assertthat_0.2.1      Matrix_1.3-4         
##  [31] fastmap_1.1.0         lazyeval_0.2.2        cli_3.3.0            
##  [34] later_1.3.0           htmltools_0.5.2       prettyunits_1.1.1    
##  [37] tools_4.1.0           igraph_1.3.1          gtable_0.3.0         
##  [40] glue_1.6.2            reshape2_1.4.4        RANN_2.6.1           
##  [43] V8_4.1.0              Rcpp_1.0.8.3          scattermore_0.8      
##  [46] vctrs_0.4.1           nlme_3.1-157          progressr_0.10.0     
##  [49] lmtest_0.9-40         spatstat.random_2.2-0 xfun_0.30            
##  [52] stringr_1.4.0         globals_0.14.0        ps_1.7.0             
##  [55] mime_0.12             miniUI_0.1.1.1        lifecycle_1.0.1      
##  [58] irlba_2.3.5           goftest_1.2-3         future_1.25.0        
##  [61] MASS_7.3-54           zoo_1.8-10            scales_1.2.0         
##  [64] spatstat.core_2.4-2   spatstat.utils_2.3-0  promises_1.2.0.1     
##  [67] parallel_4.1.0        inline_0.3.19         RColorBrewer_1.1-3   
##  [70] curl_4.3.2            reticulate_1.24       pbapply_1.5-0        
##  [73] gridExtra_2.3         loo_2.5.1             StanHeaders_2.26.6   
##  [76] rpart_4.1-15          reshape_0.8.9         stringi_1.7.6        
##  [79] highr_0.9             pkgbuild_1.3.1        rlang_1.0.2          
##  [82] pkgconfig_2.0.3       matrixStats_0.62.0    evaluate_0.15        
##  [85] lattice_0.20-45       tensor_1.5            ROCR_1.0-11          
##  [88] labeling_0.4.2        patchwork_1.1.1       htmlwidgets_1.5.4    
##  [91] cowplot_1.1.1         processx_3.5.3        tidyselect_1.1.2     
##  [94] GGally_2.1.2          parallelly_1.31.1     RcppAnnoy_0.0.19     
##  [97] plyr_1.8.7            R6_2.5.1              generics_0.1.2       
## [100] DBI_1.1.2             mgcv_1.8-40           pillar_1.7.0         
## [103] withr_2.5.0           fitdistrplus_1.1-8    abind_1.4-5          
## [106] survival_3.3-1        tibble_3.1.7          future.apply_1.9.0   
## [109] crayon_1.5.1          KernSmooth_2.23-20    utf8_1.2.2           
## [112] spatstat.geom_2.4-0   plotly_4.10.0         grid_4.1.0           
## [115] data.table_1.14.2     callr_3.7.0           digest_0.6.29        
## [118] xtable_1.8-4          httpuv_1.6.5          RcppParallel_5.1.5   
## [121] stats4_4.1.0          munsell_0.5.0         viridisLite_0.4.0