Directly quoting from Fornes O, Castro-Mondragon JA, Khan A, et al:
JASPAR (https://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release.
source:
Fornes O, Castro-Mondragon JA, Khan A, et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 2019; doi: 10.1093/nar/gkz1001
JASPAR is a database of transcription factor binding matrices with annotations and metadata. These entities are organized in a hierarchical fashion that we will explore next.
In addition to the latest JASPAR database release (2020), other
active releases are also available. Most of the rbioapi JASPAR functions
have a release
argument that allows you to use other
database releases.
## Call the function without any arguments to get a list of releases
<- rba_jaspar_releases()
releases ## Supply a release number for details:
<- rba_jaspar_releases(7) release_7_info
Within a release, Matrix profiles are organized into collections, You
can use rba_jaspar_collections()
to get a list of available
collections, or read “JASPAR Collections” section in documentation page in JASPAR web-site
for a thorough review.
## To get a list of available collection in release 2020:
rba_jaspar_collections(release = 2020)
#> name url
#> 1 CORE https://jaspar.genereg.net/api/v1/collections/CORE/
#> 2 UNVALIDATED https://jaspar.genereg.net/api/v1/collections/UNVALIDATED/
## You can list information of all matrices available in a collection:
<- rba_jaspar_collections_matrices(collection = "CORE") mat_in_core_2020
Within each collection, the matrix profiles are organized based on main taxonomic groups:
## To get a list of taxonomic groups in release 2020:
rba_jaspar_taxons(release = 2020)
#> name url
#> 1 plants https://jaspar.genereg.net/api/v1/taxon/plants/
#> 2 vertebrates https://jaspar.genereg.net/api/v1/taxon/vertebrates/
#> 3 insects https://jaspar.genereg.net/api/v1/taxon/insects/
#> 4 urochordates https://jaspar.genereg.net/api/v1/taxon/urochordates/
#> 5 nematodes https://jaspar.genereg.net/api/v1/taxon/nematodes/
#> 6 fungi https://jaspar.genereg.net/api/v1/taxon/fungi/
#> 7 trematodes https://jaspar.genereg.net/api/v1/taxon/trematodes/
#> 8 dictyostelium https://jaspar.genereg.net/api/v1/taxon/dictyostelium/
#> 9 cnidaria https://jaspar.genereg.net/api/v1/taxon/cnidaria/
#> 10 oomycota https://jaspar.genereg.net/api/v1/taxon/oomycota/
## You can list information of all matrices available in a taxonomic group:
<- rba_jaspar_taxons_matrices(tax_group = "insects") mat_in_insects
As we go down in the data organization hierarchy, Each taxonomic group consist of species:
## To get a list of species in release 2020:
<- rba_jaspar_species(release = 2020)
species head(species)
#> tax_id species
#> 1 5037 Ajellomyces capsulatus
#> 2 4151 Antirrhinum majus
#> 3 81972 Arabidopsis lyrata subsp. lyrata
#> 4 3702 Arabidopsis thaliana
#> 5 9913 Bos taurus
#> 6 6238 Caenorhabditis briggsae
#> url
#> 1 https://jaspar.genereg.net/api/v1/species/5037/
#> 2 https://jaspar.genereg.net/api/v1/species/4151/
#> 3 https://jaspar.genereg.net/api/v1/species/81972/
#> 4 https://jaspar.genereg.net/api/v1/species/3702/
#> 5 https://jaspar.genereg.net/api/v1/species/9913/
#> 6 https://jaspar.genereg.net/api/v1/species/6238/
#> matrix_url
#> 1 https://jaspar.genereg.net/api/v1/species/5037/
#> 2 https://jaspar.genereg.net/api/v1/species/4151/
#> 3 https://jaspar.genereg.net/api/v1/species/81972/
#> 4 https://jaspar.genereg.net/api/v1/species/3702/
#> 5 https://jaspar.genereg.net/api/v1/species/9913/
#> 6 https://jaspar.genereg.net/api/v1/species/6238/
## You can list information of all matrices available in a specie:
<- rba_jaspar_species_matrices(tax_id = 9606) mat_in_human
Retrieving a list of every matrix available in a given category is
not the only option. You can also build a search query using
rba_jaspar_matrix_search
. Note that this is a search
function, you are not required to fill every argument. You may use any
combination of arguments you see fit to build your query. You can even
call the function without any argument to get a list of all the matrix
profiles. For instance:
## Get a list of all the available matrix profile:
<- rba_jaspar_matrix_search()
all_matrices
## Search FOX:
<- rba_jaspar_matrix_search(term = "FOX")
FOX_matrices
## Transcription factors named FOXP3
<- rba_jaspar_matrix_search(term = "FOXP3")
FOXP3_matrices
## Transcription factors of Zipper-Type Class
<- rba_jaspar_matrix_search(tf_class = "Zipper-Type")
zipper_matrices
## Transcription factors of Zipper-Type Class in PBM collection
<- rba_jaspar_matrix_search(tf_class = "Zipper-Type",
zipper_pbm_matrices collection = "PBM")
Since JASPAR release 2010, the matrix profiles are versioned. A
matrix profile Identifier has a “base_id.version” naming schema; for
example “MA0600.2” corresponds to the second version of a matrix with
base ID MA0600. You can Use rba_jaspar_matrix_versions
to
get a list of matrix profiles with a given base ID. Also note that some
functions, generally those that are used to list available matrices,
have an argument called only_last_version
.
## Get matrix profiles versions associated to a base id
<- rba_jaspar_matrix_versions("MA0600") MA0600_versions
Now that you listed or searched for matrix profiles, you can use
rba_jaspar_matrix
to retrieve matrix profiles. There are
two ways in which you can use this function:
To do that, only fill in the matrix_id
argument in
rba_jaspar_matrix
<- rba_jaspar_matrix(matrix_id = "MA0600.2")
pfm_matrix
## you can find the matrix in the pfm element along with
## other elements which correspond to annotations and details
str(pfm_matrix)
#> List of 24
#> $ collection : chr "CORE"
#> $ remap_tf_name: chr "RFX2"
#> $ sites_url : NULL
#> $ source : chr "23332764"
#> $ versions_url : chr "https://jaspar.genereg.net/api/v1/matrix/MA0600/versions"
#> $ matrix_id : chr "MA0600.2"
#> $ medline : chr "8754849"
#> $ tffm :List of 7
#> ..$ log_p_1st_order: num -6275
#> ..$ experiment_name: chr "CistromeDB_58298"
#> ..$ tffm_id : chr "TFFM0576.1"
#> ..$ base_id : chr "TFFM0576"
#> ..$ version : int 1
#> ..$ tffm_url : chr "https://jaspar.genereg.net/api/v1/tffm/TFFM0576.1/"
#> ..$ log_p_detailed : num -6660
#> $ uniprot_ids : chr "P48378"
#> $ pazar_tf_ids : list()
#> $ sequence_logo: chr "https://jaspar.genereg.net/static/logos/svg/MA0600.2.svg"
#> $ name : chr "RFX2"
#> $ tfe_id : list()
#> $ tax_group : chr "vertebrates"
#> $ pubmed_ids : chr "8754849"
#> $ pazar_tf_id : list()
#> $ species :'data.frame': 1 obs. of 2 variables:
#> ..$ name : chr "Homo sapiens"
#> ..$ tax_id: int 9606
#> $ class : chr "Fork head/winged helix factors"
#> $ type : chr "HT-SELEX"
#> $ tfe_ids : list()
#> $ base_id : chr "MA0600"
#> $ family : chr "RFX-related factors"
#> $ version : int 2
#> $ pfm : num [1:4, 1:16] 1381 5653 4042 2336 270 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:4] "A" "C" "G" "T"
#> .. ..$ : NULL
JASPAR provides position frequency matrices (PFM) formatted as
Raw PFM, JASPAR,
TRANSFAC, YAML, and
MEME. You can download a matrix profile as a file with
any of these formats. To do that, You should use the
file_format
and save_to
arguments available in
rba_jaspar_matrix
. There are two notes here:
In this case, the function will save your matrix as a file and returns the un-parsed content of the file as a character string.
The save_to
argument in this function, and in fact
through any rbioapi function can be used in many ways:
2.1. save_to = NA: rbioapi will automatically generate a file path under
your working directory, save the file in that path , and informs you
with a message.
2.2 save_to = file_name without path: rbioapi will save the file with
your supplied name in your working directory.
2.3. save_to = a directory path (without file): rbioapi will save the
file with a proper name in that directory.
2.4. save_to = a file path (i.e. ending with .extension): rbioapi will
save the file exactly to this path. Make sure that the file extension of
the path matches your requested file format. If this was not the case,
rbioapi will save the file with the extension supplied in the path, but
issues a warning to inform you about that.
In any of the aforementioned cases, the file path can be absolute or relative.
## Different wqays in which you can save the matrix file:
<- rba_jaspar_matrix(matrix_id = "MA0600.2",
meme_matrix1 file_format = "meme")
<- rba_jaspar_matrix(matrix_id = "MA0600.2",
meme_matrix2 file_format = "meme",
save_to = "my_matrix.meme")
<- rba_jaspar_matrix(matrix_id = "MA0600.2",
meme_matrix3 file_format = "meme",
save_to = "c:/rbioapi")
<- rba_jaspar_matrix(matrix_id = "MA0600.2",
meme_matrix4 file_format = "meme",
save_to = "c:/rbioapi/my_matrix.meme")
If available, you can retrieve information on binding sites associated with a matrix profile. The information includes a data frame of genomic coordination of the binding site, URL to FASTA and BED files, along with other annotations.
## Get binding site of a matrix profile:
<- rba_jaspar_sites(matrix_id = "MA0600.2") binding_sites
JASPAR also stores and assigns identifiers to TF flexible models (TFFMs). Just like PFM (position frequency matrices), you can search TFFMs or retrieve information and annotations using a TFFM Identifier. TFFM IDs are versioned, meaning that they are in base_id.version format.
## Search TFFMs. This is a search function. Thus, what has been presented
## in `Search Matrix Profiles` section also applies here:
## Get a list of all the available matrix profile:
<- rba_jaspar_tffm_search()
all_tffms
## Search FOX:
<- rba_jaspar_tffm_search(term = "FOX")
FOX_tffms
## Transcription factors named FOXP3
<- rba_jaspar_tffm_search(term = "FOXP3")
FOXP3_tffms
## Transcription factors of insects taxonomic group
<- rba_jaspar_tffm_search(tax_group = "insects") insects_tffms
## Now that you have a TFFM ID, you can retrieve it
<- rba_jaspar_tffm("TFFM0056.3")
TFFM0056 str(TFFM0056)
#> List of 10
#> $ matrix_id : chr "MA0039.4"
#> $ matrix_url : chr "https://jaspar.genereg.net/api/v1/matrix/MA0039.4/"
#> $ tffm_id : chr "TFFM0056.3"
#> $ matrix_base_id : chr "MA0039"
#> $ base_id : chr "TFFM0056"
#> $ experiment_name: chr "CistromeDB_33718"
#> $ version : int 3
#> $ matrix_version : int 4
#> $ detailed :List of 5
#> ..$ log_p : num -6854
#> ..$ xml : chr "https://jaspar.genereg.net/static/TFFM/TFFM0056.3/TFFM_detailed_trained.xml"
#> ..$ dense_logo : chr "https://jaspar.genereg.net/static/TFFM/TFFM0056.3/TFFM_detailed_trained_dense_logo.svg"
#> ..$ hits : chr "https://jaspar.genereg.net/static/TFFM/TFFM0056.3/TFFM_detailed_trained.hits.svg"
#> ..$ summary_logo: chr "https://jaspar.genereg.net/static/TFFM/TFFM0056.3/TFFM_detailed_trained_summary_logo.svg"
#> $ first_order :List of 5
#> ..$ log_p : num -7420
#> ..$ xml : chr "https://jaspar.genereg.net/static/TFFM/TFFM0056.3/TFFM_first_order_trained.xml"
#> ..$ dense_logo : chr "https://jaspar.genereg.net/static/TFFM/TFFM0056.3/TFFM_first_order_trained_dense_logo.svg"
#> ..$ hits : chr "https://jaspar.genereg.net/static/TFFM/TFFM0056.3/TFFM_first_order_trained.hits.svg"
#> ..$ summary_logo: chr "https://jaspar.genereg.net/static/TFFM/TFFM0056.3/TFFM_first_order_trained_summary_logo.svg"
To cite JASPAR (Please see https://jaspar.genereg.net/faq/):
To cite rbioapi: (Free access link to the article)
#> R version 4.2.1 (2022-06-23 ucrt)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C
#> [2] LC_CTYPE=English_United States.utf8
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C
#> [5] LC_TIME=English_United States.utf8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] rbioapi_0.7.7
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.29 R6_2.5.1 jsonlite_1.8.0 magrittr_2.0.3
#> [5] evaluate_0.15 httr_1.4.3 stringi_1.7.8 cachem_1.0.6
#> [9] rlang_1.0.4 cli_3.3.0 curl_4.3.2 rstudioapi_0.13
#> [13] jquerylib_0.1.4 DT_0.23 bslib_0.4.0 rmarkdown_2.14
#> [17] tools_4.2.1 stringr_1.4.0 htmlwidgets_1.5.4 crosstalk_1.2.0
#> [21] xfun_0.31 yaml_2.3.5 fastmap_1.1.0 compiler_4.2.1
#> [25] htmltools_0.5.3 knitr_1.39 sass_0.4.2