Many studies include data from assays which have not been integrated into the DataSpace. Some of these are available as “Non-Integrated Datasets,” which can be downloaded from the app as a zip file. DataSpaceR
provides an interface for accessing non-integrated data from studies where it is available.
Methods on the DataSpace Study object allow you to see what non-integrated data may be available before downloading it. We will be using HVTN 505 as an example.
library(DataSpaceR)
con <- connectDS()
vtn505 <- con$getStudy("vtn505")
vtn505
#> <DataSpaceStudy>
#> Study: vtn505
#> URL: https://dataspace.cavd.org/CAVD/vtn505
#> Available datasets:
#> - Binding Ab multiplex assay
#> - Demographics
#> - Intracellular Cytokine Staining
#> - Neutralizing antibody
#> Available non-integrated datasets:
#> - ADCP
#> - Demographics (Supplemental)
#> - Fc Array
The print method on the study object will list available non-integrated datasets. The availableDatasets
property shows some more info about available datasets, with the integrated
field indicating whether the data is integrated. The value for n
will be NA
for non-integrated data until the dataset has been loaded.
name | label | n | integrated |
---|---|---|---|
BAMA | Binding Ab multiplex assay | 10260 | TRUE |
Demographics | Demographics | 2504 | TRUE |
ICS | Intracellular Cytokine Staining | 22684 | TRUE |
NAb | Neutralizing antibody | 628 | TRUE |
ADCP | ADCP | NA | FALSE |
DEM SUPP | Demographics (Supplemental) | NA | FALSE |
Fc Array | Fc Array | NA | FALSE |
Non-Integrated datasets can be loaded with getDataset
like integrated data. This will unzip the non-integrated data to a temp directory and load it into the environment.
adcp <- vtn505$getDataset("ADCP")
dim(adcp)
#> [1] 378 11
colnames(adcp)
#> [1] "study_prot" "participant_id" "study_day"
#> [4] "lab_code" "specimen_type" "antigen"
#> [7] "percent_cv" "avg_phagocytosis_score" "positivity_threshold"
#> [10] "response" "assay_identifier"
You can also view the file format info using getDatasetDescription
. For non-integrated data, this will open a pdf into your computer’s default pdf viewer.
Non-integrated data is downloaded to a temp directory by default. There are a couple of ways to override this if desired. One is to specify outputDir
when calling getDataset
or getDatasetDescription
.
If you will be accessing the data at another time and don’t want to have to re-download it, you can change the default directory for the whole study object with setDataDir
.
vtn505$dataDir
#> [1] "/tmp/RtmpoDO8Tc"
vtn505$setDataDir(".")
vtn505$dataDir
#> [1] "/home/jmtaylor/Projects/DataSpaceR/vignettes"
If the dataset already exists in the specified dataDir
or outputDir
, it will be not be downloaded. This can be overridden with reload=TRUE
, which forces a re-download.
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.5 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#>
#> locale:
#> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
#> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
#> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
#> [7] LC_PAPER=en_US.utf8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] data.table_1.14.2 DataSpaceR_0.7.5 knitr_1.37
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.8 digest_0.6.29 assertthat_0.2.1 R6_2.5.1
#> [5] jsonlite_1.8.0 magrittr_2.0.2 evaluate_0.15 highr_0.9
#> [9] httr_1.4.2 stringi_1.7.6 curl_4.3.2 tools_4.1.2
#> [13] stringr_1.4.0 Rlabkey_2.8.3 xfun_0.29 compiler_4.1.2