pepr
This vignette will show you how and why to use the derieved attributes and implied attributes functionalities concurrently of the pepr
package.
For the basic information about the PEP concept on the project website
Make sure to study the dedicated derived attributes and implied attributes vignettes prior to reading this one
While either derived attributes or implied attributes functionalities alone are often sufficient to efficiently describe your samples in PEP, the example below demonstrates how to use the derived attributes to simplify and unclutter the columns of the sample_table.csv
file, after implying the attributes for samples that follow certain patterns. The two functionalities combined provide you with the way of building complex, yet flexible sample annotation tables effortlessly. Note that the attributes implication is always performed first - before the attributes are derived. This means that the newly created attributes (implied ones) can be used to construct the attributes in the column derivation process. Please consider the example below for reference:
sample_name | organism | time | file_path |
---|---|---|---|
pig_0h | pig | 0 | data/lab/project/pig_susScr11_untreated.fastq |
pig_1h | pig | 1 | data/lab/project/pig_susScr11_treated.fastq |
frog_0h | frog | 0 | data/lab/project/frog_xenTro9_untreated.fastq |
frog_1h | frog | 1 | data/lab/project/frog_xenTro9_treated.fastq |
The specification of detailed file paths/names (as presented above) is cumbersome. In order to make your life easier just find the patterns that the file names in file_path
column of sample_table.csv
follow, imply needed attributes and derive the file names. This multi step process is orchestrated by the project_config.yaml
file via the sample_modifiers.derive
and sample_modifiers.imply
sections:
pep_version: 2.0.0
sample_table: sample_table.csv
output_dir: $HOME/hello_looper_results
sample_modifiers:
derive:
attributes: file_path
sources:
source1: /data/lab/project/{organism}_{genome}_{condition}.fastq
imply:
if:
organism: pig
then:
genome: susScr11
if:
organism: frog
then:
genome: xenTro9
if:
time: 0
then:
condition: untreated
if:
time: 1
then:
condition: treated
The *_untreated
files are clearly associated with the samples that are labeled with time
0. Therefore the untreated
attribute is implied for the samples which have 0 in the time
columns. Similarly, the codes susScr11
and xenTro9
are associated with the attributes in the oragnism
column. Therefore, the column condion
that consists of those two codes is implied from the attributes in the organism
column according to the project_config.yaml
.
Let’s introduce a few modifications to the original sample_table.csv
file to imply the attributes genome
and condition
and subsequently map the appropriate data sources from the project_config.yaml
with attributes in the derived column - [file_path]
:
sample_name | organism | time | file_path |
---|---|---|---|
pig_0h | pig | 0 | source1 |
pig_1h | pig | 1 | source1 |
frog_0h | frog | 0 | source1 |
frog_1h | frog | 1 | source1 |
Load pepr
and read in the project metadata by specifying the path to the project_config.yaml
:
library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive_imply",
"project_config.yaml",
package = "pepr"
)
p = Project(projectConfig)
## Loading config file: /private/var/folders/3f/0wj7rs2144l9zsgxd3jn5nxc0000gn/T/RtmpSzdJbG/Rinst4b945a77f0f4/pepr/extdata/example_peps-master/example_derive_imply/project_config.yaml
And inspect it:
sampleTable(p)
## sample_name organism time file_path
## 1: pig_0h pig 0 /data/lab/project/pig_susScr11_untreated.fastq
## 2: pig_1h pig 1 /data/lab/project/pig_susScr11_treated.fastq
## 3: frog_0h frog 0 /data/lab/project/frog_xenTro9_untreated.fastq
## 4: frog_1h frog 1 /data/lab/project/frog_xenTro9_treated.fastq
## genome condition
## 1: susScr11 untreated
## 2: susScr11 treated
## 3: xenTro9 untreated
## 4: xenTro9 treated
As you can see, the resulting samples are annotated the same way as if they were read from the original, unwieldy, annotations file (enriched with the genome
and condition
attributes that were implied).