ROMIC standardizes the formatting of genomic data to open up general visualizations approaches which can be used for exploratory data analysis (EDA).

R build status R-CMD-check

Package Setup

To install romic run the following code in R:

install.packages("remotes")

remotes::install_github(
  "calico/romic",
  dependencies = TRUE
  )

And, check out romic’s pkgdown site for organized documentation.

Concept

Romic Logo

Romic structures high-dimensional ’omic datasets using a flexible format that can easily be modified using tidyverse-like verbs and visualized using ggplot. These operations can be dynamically applied using romic’s shiny applications and modules to support exploratory data analysis and summarize results.

Data Model

`Omic datasets are constructed by measuring a common set of features (transcripts, metabolites, ) across a set of samples. With such data, we could represent the same data using several different format:

Romic Functions

Romic harnesses the tidy and the triple omic representations through the tidy_omic and triple_omic S3 classes. These formats each have their own pros and cons, and one is generally better than the other depending on the task. Taking advantage of this fact, tidy and triple omic objects can readily be interconverted by tracking a dataset’s design.

The design reflects the schema of a triple_omic object, and as a result, how it can be naturally rearranged to- and from- a tidy_omic. It is stored as simple list:

Since tidy_omic and triple_omic representation can readily be inter-converted, many functions can use a tidy_omic or triple_omic input, converting between the formats as needed and returning the same type of object as the input if desired. This T* Omic abstraction is referred to through the tomic S3 class.

Modifying Tomics

Tidy and triple omic objects’ core data are tables that can be directly manipulated and updated using conventional means (as long as the design is kept up to date). But, romic also includes methods which simplify working with this format and applying some common manipulations of high-dimensional data. Tidy and triple omics’ core data are “tall data”, so romic takes advantage of the tidyverse suite of packages for working with tall tabular data. Two common operations for manipulating tidy data are filtering and mutating results.

filter_tomic filters any table in a triple_omic to a range of values, values of interest, or based on a quosure (filter_tomic). Mutates are more varied, and include centering measurements (center_tomic), ordering features or samples as factors (sort_tomic) and adding lower-dimensional sample embedding (add_pca_loadings)

Visualizations

Romic provides several methods which can provide both a high-level summary of a dataset as well as interrogate specific features.

The univariate and bivariate plots are simple, but they can do a lot when combined with tomics flexible data manipulation and shiny interactivity.

Interactive Analysis with Shiny

Taking advantage of Romic’s flexible representation and manipulation of high-dimensional datasets, romic bundles a number of R Shiny Modules which can be composed into powerful Shiny Apps.

The main two apps are: