2.E: Reactome & rbioapi

Moosa Rezwani

2022-08-06


0.1 Introduction

Directly quoting from Reactome:

REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education. Founded in 2003, the Reactome project is led by Lincoln Stein of OICR, Peter D’Eustachio of NYULMC, Henning Hermjakob of EMBL-EBI, and Guanming Wu of OHSU.

(source: https://reactome.org/what-is-reactome)

Reactome provides two RESTful API services: Reactome content services and Reactome analysis services. In rbioapi, the naming schema is that any function which belongs to analysis services starts with rba_reactome_analysis* . Other rba_reactome_* functions without the ‘analysis’ infix correspond to content services API.

Before continuing reading this article, it is a good idea to read Reactome Data Model page.


0.2 Reactome analysis services

This section mostly revolves around rba_reactome_analysis() function. So, naturally, we will start with that. As explained in the function’s manual, you have considerable freedom in providing the main input for this function; You can supply an R object (as a data frame, matrix, or simple vector), a URL, or a local file path. Note that the type of analysis will be decided based on whether your input is 1-dimensional or 2-dimensional. This has been explained in detail in the manual of rba_reactome_analysis(), see that for more information.
rba_reactome_analysis() is the API equivalent of Reactome’s analyse gene list tool. You can see that the function’s arguments correspond to what would you choose in the webpage’s wizard.

## 1 We create a simple vector with our genes
genes <- c("p53", "BRCA1", "cdk2", "Q99835", "CDC42", "CDK1", "KIF23", "PLK1", "RAC2", "RACGAP1", "RHOA", "RHOB", "MSL1", "PHF21A", "INSR", "JADE2", "P2RX7", "CCDC101", "PPM1B", "ANAPC16", "CDH8", "HSPA1L", "CUL2", "ZNF302", "CUX1", "CYTH2", "SEC22C", "EIF4E3", "ROBO2", "CXXC1", "LINC01314", "ATP5F1")

## 2 We call reactome analysis with the default parameters
analyzed <- rba_reactome_analysis(input = genes,
                                  projection = TRUE,
                                  p_value = 0.01)

## 3 As always, we use str() to inspect the resutls
str(analyzed, 1)
#> List of 8
#>  $ summary            :List of 7
#>  $ expression         :List of 1
#>  $ identifiersNotFound: int 1
#>  $ pathwaysFound      : int 73
#>  $ pathways           :'data.frame': 73 obs. of  19 variables:
#>  $ resourceSummary    :'data.frame': 3 obs. of  3 variables:
#>  $ speciesSummary     :'data.frame': 1 obs. of  5 variables:
#>  $ warnings           : chr "Missing header. Using a default one."

## 4 Note that in the summary element: (analyzed$summary)
### 4.a because we supplied a simple vector, the analysis type was: over-representation
### 4.b You need the token for other rba_reactome_analysis_* functions

## 5 Analsis results are in the pathways data frame:

As mentioned, some of rba_reactome_analysis()’s arguments correspond to the wizard of analyse gene list tool; Other arguments corresponds to the contents of “Filter your results” tab in the results page.

Having the analysis’s token, you can retrieve the analysis results in many formats using rba_reactome_analysis_pdf() and rba_reactome_analysis_download():

# download a full pdf report
rba_reactome_analysis_pdf(token = analyzed$summary$token,
                          species = 9606)
# download the result in compressed json.gz format
rba_reactome_analysis_download(token = analyzed$summary$token,
                               request = "results",
                               save_to = "reactome_results.json")

Your token is only guaranteed to be stored for 7 days. After that, you can upload the JSON file you have downloaded using rba_reactome_analysis_download and get a token for that:

re_uploaded <- rba_reactome_analysis_import(input = "reactome_results.json")

Please Note: Other services supported by rbioapi also provide Over-representation analysis tools. Please see the vignette article Do with rbioapi: Over-Representation (Enrichment) Analysis in R (link to the documentation site) for an in-depth review.

0.2.1 See also in Functions’ manuals

Some rbioapi Reactome analysis functions were not covered in this vignette, be sure to check their manuals:


0.3 Reactome contents services

rbioapi functions that correspond to Reactome content services are those starting with rba_reactome_* but without “_analysis” infix. These functions cover what you can do with objects in Reactome knowledge-base. In simpler terms, most -but not all of them- correspond to what you can find in Reactome Pathway Browser and search results. (e.g. a pathway, a reaction, a physical Entity, etc.)

0.3.1 Retrieve any object from Reactome knowledge-base

Using rba_reactome_query(), you can retrieve any object from Reactome knowledge-base. In simpler terms, what I mean by the object is roughly anything that Reactome associated an ID to it. This can range from a person’s entry to proteins, reactions, pathways, species, and many more! You can explore Reactome’s data schema to learn about Reactome knowledge-base objects and their organization. Here are some examples, note that you are not limited to only one ID per query. You can use a vector of inputs, the only limitation is that when you supply more than one ID, you cannot have enhanced = TRUE.

## 1 query a pathway Entry
pathway <- rba_reactome_query(ids = "R-HSA-109581", enhanced = TRUE)
## 2 As always we use str() to inspect the output's structure
str(pathway, 2)
#> List of 27
#>  $ dbId               : int 109581
#>  $ displayName        : chr "Apoptosis"
#>  $ stId               : chr "R-HSA-109581"
#>  $ stIdVersion        : chr "R-HSA-109581.3"
#>  $ created            :List of 5
#>   ..$ dbId       : int 109608
#>   ..$ displayName: chr "Alnemri, E, Hengartner, Michael, Tschopp, Jürg, Tsujimoto, Yoshihide, Hardwick, JM, 2004-01-16"
#>   ..$ dateTime   : chr "2004-01-16 21:01:51"
#>   ..$ className  : chr "InstanceEdit"
#>   ..$ schemaClass: chr "InstanceEdit"
#>  $ modified           :List of 6
#>   ..$ dbId       : int 10874965
#>   ..$ displayName: chr "Weiser, Joel, 2022-05-21"
#>   ..$ dateTime   : chr "2022-05-21 00:52:22"
#>   ..$ note       : chr "Inserted by org.reactome.orthoinference"
#>   ..$ className  : chr "InstanceEdit"
#>   ..$ schemaClass: chr "InstanceEdit"
#>  $ isInDisease        : logi FALSE
#>  $ isInferred         : logi FALSE
#>  $ name               :List of 1
#>   ..$ : chr "Apoptosis"
#>  $ releaseDate        : chr "2004-09-20"
#>  $ releaseStatus      : chr "UPDATED"
#>  $ speciesName        : chr "Homo sapiens"
#>  $ authored           :List of 1
#>   ..$ : int 109608
#>  $ crossReference     :List of 1
#>   ..$ :List of 7
#>  $ edited             :List of 1
#>   ..$ :List of 5
#>  $ figure             :List of 1
#>   ..$ :List of 5
#>  $ goBiologicalProcess:List of 9
#>   ..$ dbId        : int 2273
#>   ..$ displayName : chr "apoptotic process"
#>   ..$ accession   : chr "0006915"
#>   ..$ databaseName: chr "GO"
#>   ..$ definition  : chr "A programmed cell death process which begins when a cell receives an internal (e.g. DNA damage) or external sig"| __truncated__
#>   ..$ name        : chr "apoptotic process"
#>   ..$ url         : chr "https://www.ebi.ac.uk/QuickGO/term/GO:0006915"
#>   ..$ className   : chr "GO_BiologicalProcess"
#>   ..$ schemaClass : chr "GO_BiologicalProcess"
#>  $ literatureReference:List of 7
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>   ..$ :List of 11
#>  $ orthologousEvent   :List of 14
#>   ..$ :List of 16
#>   ..$ :List of 16
#>   ..$ :List of 16
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 16
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 15
#>   ..$ :List of 16
#>   ..$ :List of 15
#>   ..$ :List of 16
#>   ..$ :List of 16
#>  $ reviewed           :List of 1
#>   ..$ :List of 5
#>  $ species            :List of 1
#>   ..$ :List of 8
#>  $ summation          :List of 1
#>   ..$ :List of 5
#>  $ hasDiagram         : logi TRUE
#>  $ hasEHLD            : logi TRUE
#>  $ hasEvent           :List of 4
#>   ..$ :List of 16
#>   ..$ :List of 18
#>   ..$ :List of 16
#>   ..$ :List of 16
#>  $ className          : chr "Pathway"
#>  $ schemaClass        : chr "Pathway"



## 3 You can compare it with the webpage of R-HSA-202939 entry:
# https://reactome.org/content/detail/R-HSA-202939
## 1 query a protein Entry
protein <- rba_reactome_query(ids = 66247, enhanced = TRUE)
## 2 As always we use str() to inspect the output's structure
str(protein, 1)
#> List of 27
#>  $ dbId               : int 66247
#>  $ displayName        : chr "UniProt:P25942-1 CD40"
#>  $ modified           :List of 6
#>  $ databaseName       : chr "UniProt"
#>  $ identifier         : chr "P25942"
#>  $ name               :List of 1
#>  $ otherIdentifier    :List of 109
#>  $ url                : chr "http://purl.uniprot.org/uniprot/P25942-1"
#>  $ crossReference     :List of 30
#>  $ referenceDatabase  :List of 8
#>  $ physicalEntity     :List of 1
#>  $ checksum           : chr "BC8776EC2C4A5680"
#>  $ comment            :List of 1
#>  $ description        :List of 1
#>  $ geneName           :List of 2
#>  $ isSequenceChanged  : logi FALSE
#>  $ keyword            :List of 16
#>  $ secondaryIdentifier:List of 8
#>  $ sequenceLength     : int 277
#>  $ species            : int 48887
#>  $ chain              :List of 2
#>  $ referenceGene      :List of 11
#>  $ referenceTranscript:List of 4
#>  $ variantIdentifier  : chr "P25942-1"
#>  $ isoformParent      :List of 1
#>  $ className          : chr "ReferenceIsoform"
#>  $ schemaClass        : chr "ReferenceIsoform"



## 3 You can compare it with the webpage of R-HSA-202939 entry:
# https://reactome.org/content/detail/R-HSA-202939

0.3.2 Find Cross-Reference IDs in Reactome

As you can see in the second example usage of we used Reactome’s dbID 66247 to query CD40 protein. How did we obtain that in the first place? You can use rba_reactome_xref to map any cross-reference (external) IDs to Reactome IDs.

## 1 We Supply HGNC ID to find what is the corresponding database ID in Reactome
xref_protein <- rba_reactome_xref("CD40")
## 2 As always use str() to inspect the output's structure
str(xref_protein, 1)
#> List of 19
#>  $ dbId               : int 66247
#>  $ displayName        : chr "UniProt:P25942-1 CD40"
#>  $ databaseName       : chr "UniProt"
#>  $ identifier         : chr "P25942"
#>  $ name               :List of 1
#>  $ otherIdentifier    :List of 1
#>  $ url                : chr "http://purl.uniprot.org/uniprot/P25942-1"
#>  $ checksum           : chr "BC8776EC2C4A5680"
#>  $ comment            :List of 1
#>  $ description        :List of 1
#>  $ geneName           :List of 1
#>  $ isSequenceChanged  : logi FALSE
#>  $ keyword            :List of 1
#>  $ secondaryIdentifier:List of 1
#>  $ sequenceLength     : int 277
#>  $ chain              :List of 1
#>  $ variantIdentifier  : chr "P25942-1"
#>  $ className          : chr "ReferenceIsoform"
#>  $ schemaClass        : chr "ReferenceIsoform"

0.3.3 Map Cross-Reference IDs to Reactome

While we are at the cross-reference topic, here is another useful resource. Using rba_reactome_mapping you can find the Reactome pathways or reactions which include your external ID:

## 1 Again, consider CD40 protein:
xref_mapping <- rba_reactome_mapping(id = "CD40",
                                    resource = "hgnc",
                                    map_to = "pathways")

0.4 See also in Functions’ manuals

There are still more rbioapi f Reactome content functions that were not covered in this vignette. Here is a brief overview, see the functions’ manual for detailed guides and examples.

0.4.1 Retrieve Reactome Database information

0.4.2 General Mapping/Querying

0.4.3 Things you can do with a Entities

0.4.4 Things you can do with Events

0.4.5 Pathways

0.4.6 Interactors

0.4.7 People

0.4.8 Export diagrams and events


0.5 How to Cite?

To cite Reactome (Please see https://reactome.org/cite):

To cite rbioapi: (Free access link to the article)


2 Session info

#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                          
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] rbioapi_0.7.7
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.29     R6_2.5.1          jsonlite_1.8.0    magrittr_2.0.3   
#>  [5] evaluate_0.15     httr_1.4.3        stringi_1.7.8     cachem_1.0.6     
#>  [9] rlang_1.0.4       cli_3.3.0         curl_4.3.2        rstudioapi_0.13  
#> [13] jquerylib_0.1.4   DT_0.23           bslib_0.4.0       rmarkdown_2.14   
#> [17] tools_4.2.1       stringr_1.4.0     htmlwidgets_1.5.4 crosstalk_1.2.0  
#> [21] xfun_0.31         yaml_2.3.5        fastmap_1.1.0     compiler_4.2.1   
#> [25] htmltools_0.5.3   knitr_1.39        sass_0.4.2