Table of contents:
reticulate
The strength of RaMS
is its simple data format. Table-like data structures are common in most programming languages, and they can always be converted to the nigh-universal matrix format. The goal of this vignette is to illustrate this strength by exporting MS data to several formats that can be used outside of R.
As with all rectangular data, RaMS
objects can be easily exported to CSV files with base R functions. This works best with a few chromatograms at a time, as the millions of data points found in most MS files can overwhelm common file readers.
library(RaMS)
# Locate an MS file
<- system.file("extdata", "LB12HL_AB.mzML.gz", package = "RaMS")
single_file
# Grab the MS data
<- grabMSdata(single_file, grab_what = "everything") msdata
##
## Reading file LB12HL_AB.mzML.gz... 0.06 secs
## Reading MS1 data...0.04 secs
## Reading MS2 data...0 secs
## Reading BPC...0.03 secs
## Reading TIC...0.04 secs
## Reading file metadata...0 secs
## Total time: 0.19 secs
# Write out MS1 data to .csv file
write.csv(x = msdata$MS1, file = "MS1_data.csv")
# Clean up afterward
file.remove("MS1_data.csv")
## [1] TRUE
Excel workbooks are a common format because of their intuitive GUI and widespread adoption. They can also encode more information than CSV files due to their multiple “sheets” within a single workbook - perfect for encoding both MS1 and MS2 information in one place. This vignette uses the openxlsx
package, although there are several alternatives with identical functionality.
library(openxlsx)
# Locate an MS2 file
<- system.file("extdata", "DDApos_2.mzML.gz", package = "RaMS")
MS2_file
# Grab the MS1 and MS2 data
<- grabMSdata(MS2_file, grab_what=c("MS1", "MS2")) msdata
##
## Reading file DDApos_2.mzML.gz... 0.55 secs
## Reading MS1 data...0.19 secs
## Reading MS2 data...0.48 secs
## Total time: 1.21 secs
# Write out MS data to Excel file
# openxlsx writes each object in a list to a unique sheet
# Produces one sheet for MS1 and one for MS2
write.xlsx(msdata, file = "MS2_data.xlsx")
# Clean up afterward
file.remove("MS2_data.xlsx")
## [1] TRUE
For more robust data processing and storage, or to work with larger-than-memory data sets, SQL databases are an excellent choice. This vignette will demo the RSQLite
package’s engine, although several other database engines have similar functionality.
library(DBI)
# Create the sqlite database and connect to it
<- dbConnect(RSQLite::SQLite(), "MSdata.sqlite")
MSdb
# Export MS1 and MS2 data to sqlite tables
dbWriteTable(MSdb, "MS1", msdata$MS1)
dbWriteTable(MSdb, "MS2", msdata$MS2)
dbListTables(MSdb)
## [1] "MS1" "MS2"
# Perform a simple query to ensure data was exported correctly
dbGetQuery(MSdb, 'SELECT * FROM MS1 LIMIT 3')
## rt mz int filename
## 1 4.00085 80.05009 12057.776 DDApos_2.mzML.gz
## 2 4.00085 80.26269 8178.767 DDApos_2.mzML.gz
## 3 4.00085 80.94841 19075.213 DDApos_2.mzML.gz
# Perform EIC extraction in SQL rather than in R
<- 'SELECT * FROM MS1 WHERE mz BETWEEN :lower_bound AND :upper_bound'
EIC_query <- list(lower_bound=118.086, upper_bound=118.087)
query_params <- dbGetQuery(MSdb, EIC_query, params = query_params)
EIC
# Disconnect after export
dbDisconnect(MSdb)
# Clean up afterward
unlink("MSdata.sqlite")
reticulate
R and Python are commonly used together, and the reticulate
package makes this even easier by enabling a Python interpreter within R. RStudio, in which this vignette was written, supports both R and Python code chunks as shown below.
# Locate a couple MS files
<- system.file("extdata", package = "RaMS")
data_dir <- list.files(data_dir, pattern = "HL.*mzML", full.names = TRUE)
file_paths
<- grabMSdata(files = file_paths, grab_what = "BPC")$BPC msdata
##
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
## Total time: 0.25 secs
# Not run to pass R CMD check on GitHub
# Make sure python, matplotlib, and seaborn are installed
import seaborn as sns
import matplotlib.pyplot as plt
=r.msdata, kind="line", x="rt", y="int", hue="filename")
sns.relplot(data plt.show()