R Markdown documents support the construction of reproducible documents by allowing authors to insert R code for data processing, exploration, analysis, table-making, and visualization directly into structured, prose documents. In the context of this document, we will refer to the R code in these documents as computational components since they are generated by computational means (namely the R interpreter). The prose in these documents will be referred to as narrative components. They may serve to help a reader understand the background, goal, theme, and results of the paper as well as contextualizing the computational components.
R Markdown integrates the process of integrating the computational
and narrative components of a document. By itself this is not novel. It
was identified as “Literate Programming”
Since computational components are, by definition, computationally derived objects and R Markdown is a well-defined standard it is possible to programmatically create R Markdown documents with computational components. This is the focus of this paper. However, before proceeding down this avenue, we would like to highlight two fundamental limitations to automated R Markdown document construction. First and foremost, without narrative components a document has no context. Quantitative analyses require research questions, hypotheses, reviews, interpretations, and conclusions. Computational components are necessary but not sufficient for constructing an analysis. Second, it is not possible to construct computational components for an arbitrary set of distinct analyses. An analysis itself has a context and it is built with a set of assumptions and goals. Computational components are constructed for a narrow class of problems.
However, this is not to say the programmatic creation of R Markdown documents is unwarranted or not useful. In fact it has at least two appealing characteristics. The first is convenience. The second is that for fixed analysis pipelines processing uniformly-formatted but domain-distinct data, automated document generation enforces uniformity across those domains being serviced.
The {listdown} package provides functions to programmatically create R Markdown files from named lists. It is intended for data analysis pipelines where the presentation of the results is separated from their creation. For this use case, a data processing (or analysis) is performed and the results are provided in a single named list, organized hierarchically. With the list and a {listdown} object a workflowr, pdf, word, or html page. List element names denote sections, subsections, subsubsection, etc. and the list elements contain the data structure to be presented including graphs and tables. The goal of the package is not to provide a finished, readable document. It is to provide a document with all tables and visualization that will appear (computational components). This serves as a starting point from which a user can organize outputs, describe a study, discuss results, and provide conclusions (narrative components).
listdown provides a reproducible means for producing a document with specified computational components. It is most compatible with data analysis pipelines where the data format is fixed but the analyses are either being updated, which may affect narrative components including the result discussion and conclusion, or where the experiment is different, which affects all narrative components If the narrative components are not changing with the data being pushed through your analysis pipeline, then you may be better off writing the R Markdown code manually.
The package itself is relatively simple with 6 distinct methods that can be easily incorporated into existing analysis pipelines for automatically creating documents that can be used for data exploration and reviewing analysis results as well as a starting point for a more formal write up. These methods include:
listdown() - Create a listdown object to create an R Markdown document. ld_make_chunks() - Write a listdown object to a string. ld_chunk_opts() - Apply chunk options to a presentation object. ld_workflowr_header() - Create a workflowr header. ld_rmarkdown_header() - Create an R Markdown header. ld_ioslides_header() - Create an ioslides presentation header.
The rest of this paper is structured as follow. The next section goes over basic usage and commentary. This section is meant to convey the basic approach used by the package and shows how to describe an output document using listdown, create a document, and change how the presentation of computational components can be specialized using listdown decorators. With the user accustomed to the package’s basic usage, section 3 describes the design of the package. Section 4 goes over advanced usage of the package including adding initialization code to and outputted document as well as how to control chunk-level options. Section 5 concludes the paper with a few final remarks.
Suppose we have just completed and analysis and have collected all of the results into a list where the list elements are roughly in the order we would like to present them in a document. It may be noted that this is not always how computational components derived from data analyses are collated. Often individual components are stored in multiple locations on a single machine or across machines. However, it is important to realize that even for analyses on large-scale data, the digital artifact that will be presented will be relatively small. Centralizing them makes it easier to access them, since they don’t need to be found in multiple locations. Also, storing them as a list provides a hierarchical structure that translates directly to a document as we will see below.
As a first example, we will consider the a list of visualizations
from the anscombe data set. The list is composed of four elements (named
Linear, Non Linear, Outlier Vertical, and Outlier Horizontal) each
containing a scatter plot from the famous Anscomb Quartet. From the
computational_components
list, we would like to create a
document with four sections with names corresponding to the list names,
each containing their respective visualizations.
# Use ggplot2 to create the visualizations.
library(ggplot2)
# Load the Anscombe Quartet.
data(anscombe)
# Create the ggplot objects to display.
<- list(
computational_components Linear = ggplot(anscombe, aes(x = x1, y = y1)) + geom_point() + theme_bw(),
`Non Linear` = ggplot(anscombe, aes(x = x2, y = y2)) + geom_point() + theme_bw(),
`Outlier Vertical`= ggplot(anscombe, aes(x = x3, y = y3)) + geom_point() + theme_bw(),
`Outlier Horizontal` = ggplot(anscombe, aes(x = x4, y = y4)) + geom_point() + theme_bw())
# Save the file to disk to be read by the output R Markdown document.
saveRDS(computational_components, "comp-comp.rds")
Creating a document from the computational_components
will require two steps. First, we will create a listdown
object that specifies how the computational_components
object will be loaded into the document, which libraries and code needs
to be included, the options for the R chunks, and how the list elements
will be presented in the output R markdown document.
library(listdown)
<- listdown(load_cc_expr = readRDS("comp-comp.rds"),
ld package = "ggplot2")
The ld
object, along with the computational components
in the comp-comp.rds
file are sufficient to to create the
sections, subsections, and R chunks of a document. The only other thing
requires to create the document is the header. The listdown package
currently supports regular R Markdown, workflowr, and ioslides headers.
A complete document can then be written to the console using the code
shown below. It could easily be written to a file for rendering using
the writeLines()
function, for example.
<- c(
doc as.character(ld_rmarkdown_header("Anscombe's Quartet",
author = "Francis Anscombe",
date = "1973")),
ld_make_chunks(ld))
cat("\n", paste(doc, collapse = "\n"))
#>
#> ---
#> title: Anscombe's Quartet
#> author: Francis Anscombe
#> date: '1973'
#> output: html_document
#> ---
#>
#> ```{r}
#> library(ggplot2)
#>
#> cc_list <- readRDS("comp-comp.rds")
#> ```
#>
#> # Linear
#>
#> ```{r}
#> cc_list$Linear
#> ```
#>
#> # Non Linear
#>
#> ```{r}
#> cc_list$`Non Linear`
#> ```
#>
#> # Outlier Vertical
#>
#> ```{r}
#> cc_list$`Outlier Vertical`
#> ```
#>
#> # Outlier Horizontal
#>
#> ```{r}
#> cc_list$`Outlier Horizontal`
#> ```
#>
#> # Data
#>
#> ```{r}
#> cc_list$Data
#> ```
The listdown()
function provides document-wide
R chunk options for displaying computational components. The chunk
options are exactly the same as those in the R markdown document and can
be used to tailor the default presentation for a variety of needs. The
complete set of options can be found in the R
Markdown Reference Guide. As a concrete example, the code used to
create present the plots could be hidden in the output document using
the following code.
<- listdown(load_cc_expr = readRDS("comp-comp.rds"),
ld package = "ggplot2",
echo = FALSE)
cat(paste(ld_make_chunks(ld), collapse = "\n"))
#>
#> ```{r echo = FALSE}
#> library(ggplot2)
#>
#> cc_list <- readRDS("comp-comp.rds")
#> ```
#>
#> # Linear
#>
#> ```{r echo = FALSE}
#> cc_list$Linear
#> ```
#>
#> # Non Linear
#>
#> ```{r echo = FALSE}
#> cc_list$`Non Linear`
#> ```
#>
#> # Outlier Vertical
#>
#> ```{r echo = FALSE}
#> cc_list$`Outlier Vertical`
#> ```
#>
#> # Outlier Horizontal
#>
#> ```{r echo = FALSE}
#> cc_list$`Outlier Horizontal`
#> ```
#>
#> # Data
#>
#> ```{r echo = FALSE}
#> cc_list$Data
#> ```
The first example is simple in part because the ggplot objects both
contain the data we want to display and, at the same time, provide the
mechanism for presenting them - rendering them in a graph. However, this
is not always the case. The objects being stored in the list of
computational components may not translate directly to the presentation
in a document. In these cases a function is needed that takes the list
component and returns an object to be displayed. For example, suppose
that, along with showing graphs from the Anscombe Quartet, we would like
to include the data themselves. We could add the data to the
computational_components
list and then create the document
with:
#>
#> ```{r echo = FALSE}
#> library(ggplot2)
#>
#> cc_list <- readRDS("comp-comp.rds")
#> ```
#>
#> # Linear
#>
#> ```{r echo = FALSE}
#> cc_list$Linear
#> ```
#>
#> # Non Linear
#>
#> ```{r echo = FALSE}
#> cc_list$`Non Linear`
#> ```
#>
#> # Outlier Vertical
#>
#> ```{r echo = FALSE}
#> cc_list$`Outlier Vertical`
#> ```
#>
#> # Outlier Horizontal
#>
#> ```{r echo = FALSE}
#> cc_list$`Outlier Horizontal`
#> ```
#>
#> # Data
#>
#> ```{r echo = FALSE}
#> cc_list$Data
#> ```
$Data <- anscombe
computational_componentssaveRDS(computational_components, "comp-comp.rds")
cat(paste(ld_make_chunks(ld), collapse = "\n"))
In this case, the {listdown} package will show the entire data set as is the default specified. However, suppose we do not want to show the entire data set in the document. This is common, especially when the data set is large and requires too much vertical space in the outputted document resulting in too much or irrelevant data being shown. Instead, we would like to output to an html document where the data is shown in a datatable thereby controlling the amount of real-estate needed to present the data and, at the same time, providing the user with interactivity to sort and search the data set.
In {listdown} a function or method that implements the presentation
of a computational component is referred to as a {} since if follows the
decorator pattern described in the classic software engineering text
“Design Patterns” by Gamma et al. A decorator takes the element that
will be presented as an argument and returns an object for presentation
in the output directory. A decorator is specified using the
decorator
parameter of the listdown()
function
using a named list where the name corresponds to the type and the
element correspond to the function or method that will decorate an
object of that type. For example, the anscombe
data set can
be decorated with the DT::datatable()
function as follow.
It should be noted that the DT
library both loaded
in the following code and specified as a package options. This
allows ld_make_chunks()
to verify the existence of
decorators before generating the chunks.
library(DT)
<- listdown(load_cc_expr = readRDS("comp-comp.rds"),
ld package = c("ggplot2", "DT"),
decorator = list(data.frame = datatable))
cat(paste(ld_make_chunks(ld), collapse = "\n"))
#>
#> ```{r}
#> library(ggplot2)
#> library(DT)
#>
#> cc_list <- readRDS("comp-comp.rds")
#> ```
#>
#> # Linear
#>
#> ```{r}
#> cc_list$Linear
#> ```
#>
#> # Non Linear
#>
#> ```{r}
#> cc_list$`Non Linear`
#> ```
#>
#> # Outlier Vertical
#>
#> ```{r}
#> cc_list$`Outlier Vertical`
#> ```
#>
#> # Outlier Horizontal
#>
#> ```{r}
#> cc_list$`Outlier Horizontal`
#> ```
#>
#> # Data
#>
#> ```{r}
#> datatable(cc_list$Data)
#> ```
List names in the decorator
argument provide a key to
which a function or method is mapped. The underlying decorator
resolution is implemented for a given computational component by going
through decorator names sequentially to see if the component inherits
from the name using the inherits()
function. The function
or method is selected from the corresponding name which the element
first inherits from. This means that when customizing the presentation
of objects that inherit from a common class, the more abstract classes
should appear at the end of the list. This will ensure that specialized
classes will be encountered first in the resolution process.
A separate argument, default_decorator
, allows the user
to specify the default decorator for an object whose type does not
appear in the decorator
list. This allows the user to
specify any class name for the decorator and avoids a potential type
name collision with a default decorator whose name is determined by
convention. By default, this argument is set to identity
but it can be use to not display a computational component by default if
the argument is set to NULL
.