Currently fully supports Enrichr, JASPAR, miEAA, PANTHER, Reactome, STRING, and UniProt!
The goal of rbioapi is to provide a user-friendly and consistent interface to biological databases and services: In a way that insulates the user from technicalities of using web services API and creates a unified and easy-to-use interface to biological and medical web services.
With rbioapi, you do not need to have technical knowledge about web services API or learn how to work with a new package for every biologic service or database. This an ongoing project; New databases and services will be added periodically. Feel free to suggest any databases or services you often use.
rbioapi is dedicated to Biological or Medical databases and web services. Currently, rbioapi supports and covers every API resources in the following services: (in alphabetical order):
On CRAN (Stable) version: (https://cran.r-project.org/package=rbioapi)
Only on Github (Developmental) version: (https://github.com/moosa-r/rbioapi/):
Each of the services has its dedicated vignette article. However, In this article, I will write about the general framework of rbioapi. Make sure to check the vignette article of each service to learn more about how to use them.
Note That: rbioapi is an ongoing project. New databases and services will be implemented periodically in order to gradually make the package as comprehensive as possible. Do you see yourself often using a certain database/service? Feel free to suggest any database/service by creating an issue on our GitHub repository. I will appreciate any suggestions.
You can install the stable release version of rbioapi from CRAN with:
install.packages("rbioapi")
However, the CRAN version is released at most once every 1-2 months, You can install the most recent (development) version from GitHub with:
install.packages("remotes")
::install_github("moosa-r/rbioapi") remotes
The structure of functions should be consistent across all data-bases and services. This means that, using functions to communicate with different web services should not require new learning and/or training.
Using the package should entail simple plugging of values into a function’s arguments and running the code.
No function should explicitly demand the user to run another function beforehand.
The functions’ organization, names, and arguments should be as faithful as possible to the original API resources.
The package should be scalable and easy to collaborate. To this end, exported functions should have a template-based structure and the internal functions should have a hierarchical organization with only a subset of them needed by the contributors.
Beginner users should be able to conveniently use rbioapi and be completely insulated from any technicalities.
The functions’ documentations should be extensive, self-contained, with working examples, and integrated with the corresponding web services in terms such as citing information or links to guides. In addition to the main vignette article, each supported database or web service should have its own vignette articles.
To make the namespace more organized, functions has been named with the following pattern:
rba_[service_name]_[resource_name]
For example, rba_string_version()
will call STRING’s
version resource.
rba_string_version()
#> Retrieving the STRING database version and address used by rbioapi.
#> $string_version
#> [1] "11.5"
#>
#> $stable_address
#> [1] "https://version-11-5.string-db.org"
Thus, to this version, rbioapi function will have one of the following naming schema:
There are three exceptions: rba_options()
,
rba_connection_test()
, and rba_pages()
; these
are helper functions. More on that later.
To provide more control, multiple options have been implemented. See
the manual of rba_options()
function for a full description
of available options. In short, some of the options will govern
rbioapi’s connection with servers (e.g. timeout, retry) and some of the
options will modify your experience with rbioapi (e.g. verbose,
diagnostics, save_file). There are two ways that you may use to change
any option. Also, you can get table of available rbioapi options and
their current values by calling rba_options()
without any
argument:
rba_options()
#> rbioapi_option current_value allowed_value
#> 1 diagnostics FALSE Logical (TRUE/FALSE)
#> 2 dir_name rbioapi Character
#> 3 progress FALSE Logical (TRUE/FALSE)
#> 4 retry_max 1 Numeric (0 or greater)
#> 5 retry_wait 10 Numeric (0 or greater)
#> 6 save_file FALSE Logical (TRUE/FALSE)
#> 7 skip_error TRUE Logical (TRUE/FALSE)
#> 8 timeout 600 Numeric (0.1 or greater)
#> 9 verbose TRUE Logical (TRUE/FALSE)
Now, let us consider the ways in which we can alter the settings:
Changing an option globally means that for the rest of your R
session, any rbioapi function will respect the changed option. To do
this, use rba_options().
Each argument in this function
corresponds to a certain option; Thus by running this function with your
desired new values, you could globally alter that rbioapi option. for
example:
rba_options(save_file = TRUE)
## From now on, the raw file of server's response will be saved to your working directory.
rba_options(verbose = FALSE)
## From now on, the package will be quiet.
You can pass additional arguments to any rbioapi function using
“ellipsis” (the familiar …
or dot dot dot!). Meaning that
you can call any function with additional arguments where each is
‘option = value’ pair. This way, any changes in options will be confined
within that particular function call. For example:
## Save the server's raw response file:
<- rba_reactome_species(only_main = TRUE, save_file = "reactome_species.json")
x ## Also, in the case of connection failure, retry up to 10 times:
<- rba_reactome_species(only_main = TRUE,
x save_file = "reactome_species.json", retry_max = 10)
## Run these codes in your own R session to see the difference.
## show internal diagnostics boring details
<- rba_uniprot_proteins_crossref(db_id = "CD40", db_name = "HGNC", diagnostics = TRUE)
x ## The next function you call, will still use the default rbioapi options
<- rba_uniprot_proteins_crossref(db_id = "CD40", db_name = "HGNC") x
The second exception in functions’ naming schema is
rba_connection_test()
. Run this simple function to check
your connection with the supported services/databases. If you encounter
errors when using rbioapi, kindly run this function to make sure that
your internet connection or the servers are fine.
rba_connection_test(print_output = TRUE)
#> Checking Your connection to the Databases currently supported by rbioapi:
#> --->>> Internet :
#> +++ Connected to the Internet.
#> --->>> Enrichr :
#> +++ The server is responding.
#> --->>> Ensembl :
#> +++ The server is responding.
#> --->>> JASPAR :
#> +++ The server is responding.
#> --->>> miEAA :
#> +++ The server is responding.
#> --->>> PANTHER :
#> +++ The server is responding.
#> --->>> Reactome Content Service :
#> +++ The server is responding.
#> --->>> Reactome Analysis Service :
#> +++ The server is responding.
#> --->>> STRING :
#> +++ The server is responding.
#> --->>> UniProt :
#> +++ The server is responding.
Some API resources will return paginated responses. This is
particularly common in API resources which return potentially very large
responses. In rbioapi, for these cases, there are arguments such as
“page_number” (with default value of 1) and -if the API resource allows-
“page_size”. To save your time, you may use rba_pages()
.
This function will iterate over the pages you have specified.
Take rba_uniprot_taxonomy_name as an example. This function allows you to search taxonomic nodes in UniProt. The response can potentially have a huge size, so UniProt returns a paginated response. For example, if we search for nodes that contain “adenovirus”, there is a large number of hits:
<- rba_uniprot_taxonomy_name(name = "adenovirus",
adeno search_type = "contain",
page_number = 1)
str(adeno, max.level = 2)
#> List of 2
#> $ taxonomies:'data.frame': 200 obs. of 8 variables:
#> ..$ taxonomyId : int [1:200] 10509 10510 10511 10512 10513 10514 10515 10519 10521 10522 ...
#> ..$ mnemonic : chr [1:200] "9ADEN" "ADEB3" "ADEB7" "9ADEN" ...
#> ..$ scientificName: chr [1:200] "Mastadenovirus" "Bovine adenovirus B serotype 3" "Bovine adenovirus 7" "Canine adenovirus 1" ...
#> ..$ rank : chr [1:200] "genus" "no rank" "no rank" "no rank" ...
#> ..$ superregnum : chr [1:200] "V" "V" "V" "V" ...
#> ..$ hidden : logi [1:200] FALSE TRUE TRUE TRUE TRUE TRUE ...
#> ..$ commonName : chr [1:200] NA "BAdV-3" "BAdV-7" NA ...
#> ..$ synonym : chr [1:200] NA "Mastadenovirus bos3" NA NA ...
#> $ pageInfo :List of 3
#> ..$ resultsPerPage: int 200
#> ..$ currentPage : int 1
#> ..$ totalRecords : int 963
As you can see, the server has returned the first page of the
response, to retrieve the other pages, you should make separate calls
and change the “page_number” argument within each call, or simply use
rba_pages()
as demonstrated below:
= rba_pages(quote(rba_uniprot_taxonomy_name(name = "adenovirus",
adeno_pages search_type = "contain",
page_number = "pages:1:3")))
## You can inspect the structure of the response:
str(adeno_pages, max.level = 2)
#> List of 3
#> $ page_1:List of 2
#> ..$ taxonomies:'data.frame': 200 obs. of 8 variables:
#> ..$ pageInfo :List of 3
#> $ page_2:List of 2
#> ..$ taxonomies:'data.frame': 200 obs. of 8 variables:
#> ..$ pageInfo :List of 3
#> $ page_3:List of 2
#> ..$ taxonomies:'data.frame': 200 obs. of 8 variables:
#> ..$ pageInfo :List of 3
As you can see, what we have done was:
Wrap the function call in qoute()
and enter that as
the input for rba_pages()
.
Replace the argument we want to iterate over it, with a string in this format: “pages:start:end”. For example, we supplied page_number = “pages:1:3” to get the responses of pages 1 to 3.
rbioapi is an interface between you and other databases and services. Thus, if you have used rbioapi in published research, kindly in addition to citing rbioapi, make sure to fully and properly cite the databases/services you have used. Suggested citations have been added in the functions’ manuals, under the “references” section; Nevertheless, it is the user’s responsibility to check for proper citations and to properly cite the database/services that they have used.
How to cite rbioapi: (Free access link to the article)
Moosa Rezwani, Ali Akbar Pourfathollah, Farshid Noorbakhsh, rbioapi: user-friendly R interface to biologic web services’ API, Bioinformatics, Volume 38, Issue 10, 15 May 2022, Pages 2952–2953, https://doi.org/10.1093/bioinformatics/btac172
How to cite Enrichr. (See on Enrichr website)
How to cite JASPAR. (See on JASPAR website)
How to cite miEAA. (See on miEAA website)
How to cite PANTHER. (See on PANTHER website)
How to cite Reactome. (See on Reactome website)
How to cite STRING. (See on STRING website)
How to cite UniProt. (See on UniProt website)
When using rbioapi, remember that you are querying data from web
services; So please be considerate. Never flood a server with requests,
if you need to download unreasonably large volumes of data,
directly downloading the databases supplied in those services may be a
better alternative. If you see yourself being rate-limited from any
server (HTTP 429 Too Many Requests response status
code), know that you are sending more requests than what the server
interprets as normal behavior, so please seek other methods or use
Sys.sleep()
between your requests.
Each supported service has a dedicated vignette article. Make sure to check those too.
We are also adding vignette articles focusing on tasks and workflows:
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rbioapi_0.7.7
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.29 R6_2.5.1 jsonlite_1.8.0 magrittr_2.0.3
#> [5] evaluate_0.15 httr_1.4.3 stringi_1.7.8 cachem_1.0.6
#> [9] rlang_1.0.4 cli_3.3.0 curl_4.3.2 rstudioapi_0.13
#> [13] jquerylib_0.1.4 bslib_0.4.0 rmarkdown_2.14 tools_4.2.1
#> [17] stringr_1.4.0 xfun_0.31 yaml_2.3.5 fastmap_1.1.0
#> [21] compiler_4.2.1 htmltools_0.5.3 knitr_1.39 sass_0.4.2