Features

deepdep package exports the following functions:

get_available_packages,
get_description,
get_dependencies,
get_downloads,
deepdep,
plot_dependencies.

Those functions rely on each other and are ordered from the lowest to the highest level. We’ll describe what they exactly do and how on examples.

get_available_packages

This function lists, as the name indicates, available packages. The default behaviour is listing all CRAN packages.

t <- get_available_packages()
head(t, 20)
#>            A3         aaSEA      AATtools           aba        ABACUS    abbreviate 
#>          "A3"       "aaSEA"    "AATtools"         "aba"      "ABACUS"  "abbreviate" 
#>        abbyyR           abc      abc.data       ABC.RAP        abcADM   ABCanalysis 
#>      "abbyyR"         "abc"    "abc.data"     "ABC.RAP"      "abcADM" "ABCanalysis" 
#>      abcdeFBA      ABCoptim         ABCp2         abcrf       abcrlda      abctools 
#>    "abcdeFBA"    "ABCoptim"       "ABCp2"       "abcrf"     "abcrlda"    "abctools" 
#>           abd         abdiv 
#>         "abd"       "abdiv"

However, if you want to check if package is present in a little wider range – on CRAN or Bioconductor repositories, you simply need to set argument bioc = TRUE. In this case function is simply wrapper around BiocManager::available() and to use it you need to have BiocManager package (available via CRAN) installed.

t <- get_available_packages(bioc = TRUE)
#> 'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for
#> details
#> 
#> replacement repositories:
#>     CRAN: https://cran.rstudio.com/
head(t, 20)
#>  [1] "A3"            "a4"            "a4Base"        "a4Classif"     "a4Core"       
#>  [6] "a4Preproc"     "a4Reporting"   "aaSEA"         "AATtools"      "aba"          
#> [11] "ABACUS"        "ABAData"       "ABAEnrichment" "ABarray"       "abbreviate"   
#> [16] "abbyyR"        "abc"           "abc.data"      "ABC.RAP"       "abcADM"

Another possibility is checking what packages are installed – you do it by adding local = TRUE parameter.

t <- get_available_packages(local = TRUE)
head(t, 20)
#>  [1] "abind"         "ada"           "adabag"        "ade4"          "AmyloGram"    
#>  [6] "AmyloGraph"    "ape"           "askpass"       "assertthat"    "auprc"        
#> [11] "backports"     "base64enc"     "base64url"     "BBmisc"        "bench"        
#> [16] "BH"            "binman"        "Biobase"       "BiocFileCache" "BiocGenerics"

Result of this function is cached (for more details, see Caching section of this vignette).

get_description

When you know, that given package is available, you may want to obtain DESCRIPTION of this package, at least the most essential parts of it, especially dependencies. You can do it by calling:

get_description("DALEXtra")
#> DALEXtra: Extension for 'DALEX' Package
#> Maintainer: Szymon Maksymiuk <sz.maksymiuk@gmail.com> 
#> Description: 
#>  Provides wrapper of various machine learning models.
#> In applied machine learning, there
#> is a strong belief that we need to strike a balance
#> between interpretability and accuracy.
#> However, in field of the interpretable machine learning,
#> there are more and more new ideas for explaining black-box models,
#> that are implemented in 'R'.
#> 'DALEXtra' creates 'DALEX' Biecek (2018) <arXiv:1806.08915> explainer for many type of models
#> including those created using 'python' 'scikit-learn' and 'keras' libraries, and 'java' 'h2o' library.
#> Important part of the package is Champion-Challenger analysis and innovative approach
#> to model performance across subsets of test data presented in Funnel Plot.
#> Third branch of 'DALEXtra' package is aspect importance analysis
#> that provides instance-level explanations for the groups of explanatory variables. 
#> Depends: R DALEX 
#> Imports: reticulate ggplot2 
#> LinkingTo: 
#> Suggests: auditor ingredients gbm ggrepel h2o iml lime localModel mlr mlr3 randomForest recipes rmarkdown rpart xgboost testthat tidymodels 
#> Enhances: 
#> Scrap date: 2021-05-09 09:08:34

Again, you can pass bioc = TRUE if you want to check for this package in Bioconductor repository. Notice that if package is not found there, it will be searched for on CRAN. The reason behind this type of behaviour is the fact that packages present on Bioconductor are updated more often than on CRAN and not all of them are present here. Option local = TRUE for only installed packages is also possible. If a package is not available in a given source, the function will return NULL value:

get_description("a4")
#> NULL
get_description("a4", bioc = TRUE)
#> a4: Automated Affymetrix Array Analysis Umbrella Package
#> Maintainer: Laure Cougnaud <laure.cougnaud@openanalytics.eu> 
#> Description: 
#>  Umbrella package is available for the entire Automated Affymetrix Array Analysis suite of package. 
#> Depends: a4Base a4Preproc a4Classif a4Core a4Reporting 
#> Imports: 
#> LinkingTo: 
#> Suggests: MLP nlcv ALL Cairo Rgraphviz GOstats 
#> Enhances: 
#> Scrap date:

Result of this function is also cached (for more details, see Caching section of this vignette).

get_downloads

This package allows you obtaining information on how many times specified package was downloaded. However, it works only with CRAN packages.

get_downloads("ggplot2")
#>                     x
#> last_day        85630
#> last_week      600713
#> last_month    2849787
#> last_quarter  9340444
#> last_half    15747252
#> grand_total  70015383

Results of this function is not cached.

get_dependencies

After parsing description file, you can now create a data.frame which will describe dependencies between given package and others. You do it by using this function:

get_dependencies("ggplot2")
#>       name   version    type last_day last_week last_month last_quarter last_half grand_total
#> 1   digest      <NA> Imports    30915    197135     992112      2956521   5077241    43549524
#> 2     glue      <NA> Imports    62423    302536    1523995      4340958   7557204    42758650
#> 3   gtable  >= 0.1.1 Imports    21663    129947     651320      2037624   3427430    26841705
#> 4  isoband      <NA> Imports    24034    152648     756437      2411172   4424548    16086469
#> 5     MASS      <NA> Imports     3971     28816     140938       428300    804317     9416911
#> 6     mgcv      <NA> Imports     7109     22872     123990       447953    719669     6933554
#> 7    rlang >= 0.4.10 Imports    68029    441128    2020967      6509641  12043421    67191569
#> 8   scales  >= 0.5.0 Imports    24462    149038     737337      2315333   3974440    34077837
#> 9   tibble      <NA> Imports    46437    293190    1392810      4472978   8274610    51687533
#> 10   withr  >= 2.0.0 Imports    35740    227038    1098520      3269447   5905686    30766402

As with two previously described functions - get_available_packages and get_description, here you can also use bioc = TRUE or local = TRUE and again, in case the package is not available, the result will be NULL. Here you have another options to set.

The first one is parameter downloads – should number of downloads of packages be included? It uses get_downloads and works only with CRAN packages.

Another, more important parameter is dependency_type. You can specify how detailed should be list of dependencies. Default value is "strong", which is a shorthand for c("Depends", "Imports", "LinkingTo"), but you can chose any combination of those and additionally "Suggests", "Enhances".

get_dependencies("ggplot2", downloads = FALSE, dependency_type = c("Imports", "Suggests", "Enhances"))
#>             name       version     type
#> 1         digest          <NA>  Imports
#> 2           glue          <NA>  Imports
#> 3         gtable      >= 0.1.1  Imports
#> 4        isoband          <NA>  Imports
#> 5           MASS          <NA>  Imports
#> 6           mgcv          <NA>  Imports
#> 7          rlang     >= 0.4.10  Imports
#> 8         scales      >= 0.5.0  Imports
#> 9         tibble          <NA>  Imports
#> 10         withr      >= 2.0.0  Imports
#> 11          covr          <NA> Suggests
#> 12          ragg          <NA> Suggests
#> 13         dplyr          <NA> Suggests
#> 14 ggplot2movies          <NA> Suggests
#> 15        hexbin          <NA> Suggests
#> 16         Hmisc          <NA> Suggests
#> 17        interp          <NA> Suggests
#> 18         knitr          <NA> Suggests
#> 19       lattice          <NA> Suggests
#> 20       mapproj          <NA> Suggests
#> 21          maps          <NA> Suggests
#> 22      maptools          <NA> Suggests
#> 23      multcomp          <NA> Suggests
#> 24       munsell          <NA> Suggests
#> 25          nlme          <NA> Suggests
#> 26       profvis          <NA> Suggests
#> 27      quantreg          <NA> Suggests
#> 28  RColorBrewer          <NA> Suggests
#> 29         rgeos          <NA> Suggests
#> 30     rmarkdown          <NA> Suggests
#> 31         rpart          <NA> Suggests
#> 32            sf      >= 0.7-3 Suggests
#> 33       svglite >= 1.2.0.9001 Suggests
#> 34      testthat      >= 2.1.0 Suggests
#> 35        vdiffr      >= 1.0.0 Suggests
#> 36          xml2          <NA> Suggests
#> 37            sp          <NA> Enhances

Result of this function is not cached (at least yet).

deepdep

The main function of the package – it is simply wrapper around get_dependencies, that allows you getting not only dependencies, but also dependencies of the dependencies iteratively! (Now you know, why we called it deepdep).

Parameters are the same as in get_dependencies, but additionally you can specify depth parameter, which describes how many iterations it function should perform. If depth equals 1, it’s simply the same as calling get_dependencies.

deepdep("ggplot2", depth = 2)
#>     origin         name   version    type origin_level dest_level
#> 1  ggplot2       digest      <NA> Imports            0          1
#> 2  ggplot2         glue      <NA> Imports            0          1
#> 3  ggplot2       gtable  >= 0.1.1 Imports            0          1
#> 4  ggplot2      isoband      <NA> Imports            0          1
#> 5  ggplot2         MASS      <NA> Imports            0          1
#> 6  ggplot2         mgcv      <NA> Imports            0          1
#> 7  ggplot2        rlang >= 0.4.10 Imports            0          1
#> 8  ggplot2       scales  >= 0.5.0 Imports            0          1
#> 9  ggplot2       tibble      <NA> Imports            0          1
#> 10 ggplot2        withr  >= 2.0.0 Imports            0          1
#> 11    mgcv         nlme >= 3.1-64 Depends            1          2
#> 12    mgcv       Matrix      <NA> Imports            1          2
#> 13  scales       farver  >= 2.0.3 Imports            1          2
#> 14  scales     labeling      <NA> Imports            1          2
#> 15  scales    lifecycle      <NA> Imports            1          2
#> 16  scales      munsell    >= 0.5 Imports            1          2
#> 17  scales           R6      <NA> Imports            1          2
#> 18  scales RColorBrewer      <NA> Imports            1          2
#> 19  scales  viridisLite      <NA> Imports            1          2
#> 20  tibble     ellipsis  >= 0.3.2 Imports            1          2
#> 21  tibble        fansi  >= 0.4.0 Imports            1          2
#> 22  tibble    lifecycle  >= 1.0.0 Imports            1          2
#> 23  tibble     magrittr      <NA> Imports            1          2
#> 24  tibble       pillar  >= 1.6.2 Imports            1          2
#> 25  tibble    pkgconfig      <NA> Imports            1          2
#> 26  tibble        rlang >=\n0.4.3 Imports            1          1
#> 27  tibble        vctrs  >= 0.3.8 Imports            1          2

plot_dependencies

As famous quote says,

A picture is worth more than a thousand words.

That’s why we have plot_dependencies function. It allows visualizing easily what are dependencies of specified package.

The function is generic, and currently supports two types of object – you can pass a deepdep object, result of the calling the deepdep function or just type name of the package. With the latter option you can also pass arguments to get_dependencies as additional parameters.

dd <- deepdep("tibble", 2)
plot_dependencies(dd)

plot of chunk plot_dependencies1

plot_dependencies("DT", depth = 2, dependency_type = c("Imports", "Depends", "Suggests"))

plot of chunk plot_dependencies1

In each of the plots you can see one package name in the centre and two circles of packages gathered around them. These are dependencies of the first and second level.

Default plot type is circular, as you can see on the examples presented above. However, you can set plot_type parameter to tree.

plot_dependencies(dd, type = "tree")

plot of chunk plot_dependencies2

Not all dependencies are plotted. To increase readability, dependencies on the same level are hidden, but you can change this behaviour

plot_dependencies(dd, same_level = TRUE)

plot of chunk plot_dependencies3

You can also make use of numbers of downloads you obtained. There is an option to add labels to only certain percentage of most downloaded packages among those that are about to be plotted. This is meant to increase readability of the plot.

plot_dependencies("tidyverse", type = "circular", label_percentage = 0.2, depth = 3)

plot of chunk plot_dependencies4

Finally, returned object is a ggplot object, so you can easily manipulate them with syntax known from ggplot2 package. We also use ggraph enhancement for plotting graphs.

plot_dependencies(dd) +
  ggplot2::scale_fill_manual(values = c("#462CF8", "#F23A90", "#AF1023")) +
  ggraph::scale_edge_color_manual(values = "black")
#> Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the
#> existing scale.

plot of chunk plot_dependencies5

deepdep package

Introduction

Use case