Author: Mario A. Martínez Araya
Date: 2021-07-07
Url: http://marioma.me/?i=soft
CRAN: https://cran.r-project.org/package=tmplate
This R package is intended to modify general templates replacing tags by variable content. It was first created to modify R and Bash scripts necessary for parallel computation using MPI. Although it is related with R package tRnslate
, the package tmplate
performs a different task which is enhanced by tRnslate
.
In principle any version of R should be useful but I have not tested them all.
The same than for any other R package. You can download the tar file from CRAN (tmplate)
R CMD INSTALL /path/to/tar/file/tmplate_0.0.3.tar.gz
or from R console:
install.packages("tmplate", lib = "path/to/R/library")
The main function of the package is translate
, where its main input arguments are as below:
translate ( vars, ..., template, envir )
template
is a character vector where each element is a line in the template that can be obtained using readLines
. Alternatively, it can be a unique string packing all the content of the template where each line is assumed from the newline character. It can also be a file path but it requires to set allow_file = TRUE
....
are the variables and their values which are used to directly modify the content of the template. For instance name1 = value1,, ..., nameK = valueK
where name1
, …, nameK
are the tag names, say <:name1:>
, …, <:nameK:>
, that can be used within the template to modify its content depending the values of the variables.envir
is an environment where the input variables will be evaluated. Additionally it can have its own variables used to modify the template content.vars
is a named list whose elements are taken as variables to be used in the tags within the template. This is useful when there are too many input variables for translate
.As we will see, the output from translate
is a character vector where each element correspond to a line in the output file.
Lines starting with @r
or @R
followed by one space or tabular, define chunks of R code that is also interpreted and translated by translate
. The chunks of R code can be assignation or output chunks following the rules from the R package tRnslate
(see tRnslate vignette). Assignation chunks are those including <-
for assigning an object, while output chunks print R output to the template. Thus several assignation chunks can be placed in adjacent lines, however assignation and output chunks must be separated by one empty line (the same for consecutive output chunks). Alternatively, inline R code can be entered using <r@ code @>
or <R@ code @>
. Inline R code with assignation does not produce output so is replaced by blank, while inline R code producing output while modify the resulting template. Additionaly, inline R code can be used within tag variable definitions to allow different content.
The R code can use tag variables that point to the value of argument variables which are being used to modify the template content, for example in the R chunk
@r # R chunk (printing)
@r ifelse(is_mod=="-1", "# module environment not found", paste("<:MODULES_LOAD:>"))
it uses the tag <:MODULES_LOAD:>
to point to the value of the argument variable MODULES_LOAD
. Similarly, the content of tag variables can be modified using inline R code in the definition of the argument when calling translate
. For example the argument MPI_ASK_N = '<r@ <:SLURM_ASK_NODES:> * <:SLURM_ASK_TASKS:> @>'
will compute (using inline R code) the number of parallel jobs from the input arguments for the number of nodes and tasks. Note that the NULL definition in the template above is used in a R logical expression to decide whether to print or not <:SLURM_ARRAY:>
into the source code. Alternatively this decision may have been done purely based on R code.
The evaluation of the inline and chunks of R code to update the input arguments and replace the tags in the template is performed in an environment that can be set by the user. As said before, this environment can contain its own objects which can also be referenced to update the input arguments and modify the content of the template.
translate
commandGiven the template above, we can define the input arguments directly when calling translate
as done below:
## remember to load: library(tmplate) or call tmplate::translate
TT <- translate(
SHELL_CALL='#!/bin/bash',
SLURM_SBATCH=ifelse(.Platform$OS.type=="unix", ifelse(system("clu=$(sinfo --version 2>&1) || clu=`echo -1`; echo $clu",intern = TRUE)=="-1", '<:NULL:>', '#SBATCH '), '<:NULL:>'),
SLURM_PARTITION='<:SLURM_SBATCH:>--partition=defq',
SLURM_ASK_NODES=2,
SLURM_NODES='<:SLURM_SBATCH:>--nodes=<:SLURM_ASK_NODES:>',
SLURM_ASK_TASKS=4,
SLURM_TASKS='<:SLURM_SBATCH:>--ntasks-per-node=<:SLURM_ASK_TASKS:>',
SLURM_MEMORY='<:SLURM_SBATCH:>--memory=2gb',
SLURM_TIME='<:SLURM_SBATCH:>--time=1:00:00',
SLURM_ARRAY="<:NULL:>",
MODULES_LOAD='module load module/for/openmpi module/for/R',
WORKDIR=ifelse('<:SLURM_SBATCH:>'!='#SBATCH','# no slurm machine','cd ${SLURM_SUBMIT_DIR}'),
TASK="<:NULL:>",
PASS_TASK="<:NULL:>",
PASS_TASK_VAR="<:NULL:>",
MPI_N="<:NULL:>",
MPI_ASK_N='<r@ <:SLURM_ASK_NODES:> * <:SLURM_ASK_TASKS:> @>',
R_HOME=R.home("bin"),
R_OPTIONS='--no-save --no-restore',
R_FILE_INPUT='script.R',
R_ARGS='',
R_FILE_OUTPUT='output.Rout',
MPIRUN='mpirun --mca mpi_warn_on_fork 0 -n <:MPI_ASK_N:> <:R_HOME:>/Rscript <:R_OPTIONS:> "<:R_FILE_INPUT:>" <r@ ifelse(!any(grepl("^<:NULL:>$","<:SLURM_ARRAY:>")),"<:PASS_TASK_VAR:>","") @> <:R_ARGS:> > <:R_FILE_OUTPUT:>',
MESSAGE_CLOSE='echo "Job submitted on $(date +%F) at $(date +%T)."',
drop = TRUE,
template = T
)
Here we have used a new default environment to evaluate the arguments. the argument drop = TRUE
will delete any line containing <:NULL:>
in it.
The output from translate
is a character vector where each element is a line in the resulting file. We can print it to disk easily using cat
(remember to set sep = "\n"
).
The final content of the template once translated depends on the values of the variables used (which are system dependent). Thus, for a multicore PC with OpenMPI but without a dynamic environment modules manager such as environment-modules or Lmod and without a job scheduler such as SLURM then the output of cat(TT, sep="\n")
will be something like this:
#!/bin/bash
# module environment not found
# no slurm machine
mpirun --mca mpi_warn_on_fork 0 -n 8 /apps/local/resources/svn/R/r-devel/build/bin/Rscript --no-save --no-restore "script.R" > output.Rout
echo "Job submitted on $(date +%F) at $(date +%T)."
While for an SLURM
managed HPC having also environment-modules
we would obtain:
#!/bin/bash
#SBATCH --partition=defq
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=4
#SBATCH --memory=2gb
#SBATCH --time=1:00:00
module load module/for/openmpi module/for/R
cd ${SLURM_SUBMIT_DIR}
mpirun --mca mpi_warn_on_fork 0 -n 8 /usr/lib/R/bin/Rscript --no-save --no-restore "script.R" > output.Rout
echo "Job submitted on $(date +%F) at $(date +%T)."
Additional rules could be added to control the lenght of the mpirun line, however as it is it works fine. Other source code can be generated following the same principles described before.
For templates having too many variables, translation can be performed calling a list with names elements containing the variables part of the template. For instance, the previous example could be also:
## list with arguments
v <- list(
SHELL_CALL='#!/bin/bash',
SLURM_SBATCH=ifelse(.Platform$OS.type=="unix", ifelse(system("clu=$(sinfo --version 2>&1) || clu=`echo -1`; echo $clu",intern = TRUE)=="-1", '<:NULL:>', '#SBATCH '), '<:NULL:>'),
SLURM_PARTITION='<:SLURM_SBATCH:>--partition=defq',
SLURM_ASK_NODES=2,
SLURM_NODES='<:SLURM_SBATCH:>--nodes=<:SLURM_ASK_NODES:>',
SLURM_ASK_TASKS=4,
SLURM_TASKS='<:SLURM_SBATCH:>--ntasks-per-node=<:SLURM_ASK_TASKS:>',
SLURM_MEMORY='<:SLURM_SBATCH:>--memory=2gb',
SLURM_TIME='<:SLURM_SBATCH:>--time=1:00:00',
SLURM_ARRAY="<:NULL:>",
MODULES_LOAD='module load module/for/openmpi module/for/R',
WORKDIR=ifelse('<:SLURM_SBATCH:>'!='#SBATCH','# no slurm machine','cd ${SLURM_SUBMIT_DIR}'),
TASK="<:NULL:>",
PASS_TASK="<:NULL:>",
PASS_TASK_VAR="<:NULL:>",
MPI_N="<:NULL:>",
MPI_ASK_N='<r@ <:SLURM_ASK_NODES:> * <:SLURM_ASK_TASKS:> @>',
R_HOME=R.home("bin"),
R_OPTIONS='--no-save --no-restore',
R_FILE_INPUT='script.R',
R_ARGS='',
R_FILE_OUTPUT='output.Rout',
MPIRUN='mpirun --mca mpi_warn_on_fork 0 -n <:MPI_ASK_N:> <:R_HOME:>/Rscript <:R_OPTIONS:> /
"<:R_FILE_INPUT:>" <r@ ifelse(!any(grepl("^<:NULL:>$","<:SLURM_ARRAY:>")),"<:PASS_TASK_VAR:>","") @> /
<:R_ARGS:> > <:R_FILE_OUTPUT:>',
MESSAGE_CLOSE='echo "Job submitted on $(date +%F) at $(date +%T)."'
)
## Produce output
## remember to load: library(tmplate) or call tmplate::translate
TT <- translate(vars = v, drop = TRUE, template = T)
## See result
cat(TT, sep="\n")
which produces the same output.
Since tmplate
uses tRnslate
, then some of the limitations of the latter also applies to the former (see tRnslate vignette for more details).
Never replace the content of a template writing the output to the same file.
Always check the content of the “translated” output before using it for other tasks.
Be cautious.