Background and Concepts

R Markdown documents support the construction of reproducible documents by allowing authors to insert R code for data processing, exploration, analysis, table-making, and visualization directly into structured, prose documents. In the context of this document, we will refer to the R code in these documents as computational components since they are generated by computational means (namely the R interpreter). The prose in these documents will be referred to as narrative components. They may serve to help a reader understand the background, goal, theme, and results of the paper as well as contextualizing the computational components.

R Markdown integrates the process of integrating the computational and narrative components of a document. By itself this is not novel. It was identified as “Literate Programming” in and software tools, like Sweave have supported this functionality for over a decade. R Markdown’s popularity has likely been driven by two factors. The first is the relative ease with which these documents can be constructed. While it is more expressive, is relatively technical and requires an investment in time to become proficient. By contrast R Markdown documents are easier to create and format and, when the document is used to create formatting can be passed through to the underlying .tex file. The second factor driving adoption is likely its support for creating modifiable documents, Microsoft Word in particular. Researchers and analysts, especially those creating applied statistical analyses, often collaborate with domain experts with less technical knowledge. In these cases, the analyst focuses on creating the computational components and narrative components related to results and interpretation. After this initial document is created, the domain expert is free to develop narrative components directly in the document without needing to go through the analyst.

Since computational components are, by definition, computationally derived objects and R Markdown is a well-defined standard it is possible to programmatically create R Markdown documents with computational components. This is the focus of this paper. However, before proceeding down this avenue, we would like to highlight two fundamental limitations to automated R Markdown document construction. First and foremost, without narrative components a document has no context. Quantitative analyses require research questions, hypotheses, reviews, interpretations, and conclusions. Computational components are necessary but not sufficient for constructing an analysis. Second, it is not possible to construct computational components for an arbitrary set of distinct analyses. An analysis itself has a context and it is built with a set of assumptions and goals. Computational components are constructed for a narrow class of problems.

However, this is not to say the programmatic creation of R Markdown documents is unwarranted or not useful. In fact it has at least two appealing characteristics. The first is convenience. The second is that for fixed analysis pipelines processing uniformly-formatted but domain-distinct data, automated document generation enforces uniformity across those domains being serviced.

The {listdown} package provides functions to programmatically create R Markdown files from named lists. It is intended for data analysis pipelines where the presentation of the results is separated from their creation. For this use case, a data processing (or analysis) is performed and the results are provided in a single named list, organized hierarchically. With the list and a {listdown} object a workflowr, pdf, word, or html page. List element names denote sections, subsections, subsubsection, etc. and the list elements contain the data structure to be presented including graphs and tables. The goal of the package is not to provide a finished, readable document. It is to provide a document with all tables and visualization that will appear (computational components). This serves as a starting point from which a user can organize outputs, describe a study, discuss results, and provide conclusions (narrative components).

listdown provides a reproducible means for producing a document with specified computational components. It is most compatible with data analysis pipelines where the data format is fixed but the analyses are either being updated, which may affect narrative components including the result discussion and conclusion, or where the experiment is different, which affects all narrative components If the narrative components are not changing with the data being pushed through your analysis pipeline, then you may be better off writing the R Markdown code manually.

The package itself is relatively simple with 6 distinct methods that can be easily incorporated into existing analysis pipelines for automatically creating documents that can be used for data exploration and reviewing analysis results as well as a starting point for a more formal write up. These methods include:

listdown() - Create a listdown object to create an R Markdown document. ld_make_chunks() - Write a listdown object to a string. ld_chunk_opts() - Apply chunk options to a presentation object. ld_workflowr_header() - Create a workflowr header. ld_rmarkdown_header() - Create an R Markdown header. ld_ioslides_header() - Create an ioslides presentation header.

The rest of this paper is structured as follow. The next section goes over basic usage and commentary. This section is meant to convey the basic approach used by the package and shows how to describe an output document using listdown, create a document, and change how the presentation of computational components can be specialized using listdown decorators. With the user accustomed to the package’s basic usage, section 3 describes the design of the package. Section 4 goes over advanced usage of the package including adding initialization code to and outputted document as well as how to control chunk-level options. Section 5 concludes the paper with a few final remarks.

Basic Usage and Commentary

Suppose we have just completed and analysis and have collected all of the results into a list where the list elements are roughly in the order we would like to present them in a document. It may be noted that this is not always how computational components derived from data analyses are collated. Often individual components are stored in multiple locations on a single machine or across machines. However, it is important to realize that even for analyses on large-scale data, the digital artifact that will be presented will be relatively small. Centralizing them makes it easier to access them, since they don’t need to be found in multiple locations. Also, storing them as a list provides a hierarchical structure that translates directly to a document as we will see below.

As a first example, we will consider the a list of visualizations from the anscombe data set. The list is composed of four elements (named Linear, Non Linear, Outlier Vertical, and Outlier Horizontal) each containing a scatter plot from the famous Anscomb Quartet. From the computational_components list, we would like to create a document with four sections with names corresponding to the list names, each containing their respective visualizations.

# Use ggplot2 to create the visualizations.
library(ggplot2)

# Load the Anscombe Quartet.
data(anscombe)

# Create the ggplot objects to display.
computational_components <- list(
  Linear = ggplot(anscombe, aes(x = x1, y = y1)) + geom_point() + theme_bw(),
  `Non Linear` = ggplot(anscombe, aes(x = x2, y = y2)) + geom_point() + theme_bw(),
  `Outlier Vertical`= ggplot(anscombe, aes(x = x3, y = y3)) + geom_point() + theme_bw(),
  `Outlier Horizontal` = ggplot(anscombe, aes(x = x4, y = y4)) + geom_point() + theme_bw())

# Save the file to disk to be read by the output R Markdown document.
saveRDS(computational_components, "comp-comp.rds")

Creating a Document with listdown

Creating a document from the computational_components will require two steps. First, we will create a listdown object that specifies how the computational_components object will be loaded into the document, which libraries and code needs to be included, the options for the R chunks, and how the list elements will be presented in the output R markdown document.

library(listdown)

ld <- listdown(load_cc_expr = readRDS("comp-comp.rds"),
               package = "ggplot2")

The ld object, along with the computational components in the comp-comp.rds file are sufficient to to create the sections, subsections, and R chunks of a document. The only other thing requires to create the document is the header. The listdown package currently supports regular R Markdown, workflowr, and ioslides headers. A complete document can then be written to the console using the code shown below. It could easily be written to a file for rendering using the writeLines() function, for example.

doc <- c(
  as.character(ld_rmarkdown_header("Anscombe's Quartet",
                                   author = "Francis Anscombe",
                                   date = "1973")),
  ld_make_chunks(ld))

cat("\n", paste(doc, collapse = "\n"))
#> 
#>  ---
#> title: Anscombe's Quartet
#> author: Francis Anscombe
#> date: '1973'
#> output: html_document
#> ---
#> 
#> ```{r}
#> library(ggplot2)
#> 
#> cc_list <- readRDS("comp-comp.rds")
#> ```
#> 
#> # Linear
#> 
#> ```{r}
#> cc_list$Linear
#> ```
#> 
#> # Non Linear
#> 
#> ```{r}
#> cc_list$`Non Linear`
#> ```
#> 
#> # Outlier Vertical
#> 
#> ```{r}
#> cc_list$`Outlier Vertical`
#> ```
#> 
#> # Outlier Horizontal
#> 
#> ```{r}
#> cc_list$`Outlier Horizontal`
#> ```
#> 
#> # Data
#> 
#> ```{r}
#> cc_list$Data
#> ```

The listdown() function provides document-wide R chunk options for displaying computational components. The chunk options are exactly the same as those in the R markdown document and can be used to tailor the default presentation for a variety of needs. The complete set of options can be found in the R Markdown Reference Guide. As a concrete example, the code used to create present the plots could be hidden in the output document using the following code.

ld <- listdown(load_cc_expr = readRDS("comp-comp.rds"), 
               package = "ggplot2",
               echo = FALSE)

cat(paste(ld_make_chunks(ld), collapse = "\n"))
#> 
#> ```{r echo = FALSE}
#> library(ggplot2)
#> 
#> cc_list <- readRDS("comp-comp.rds")
#> ```
#> 
#> # Linear
#> 
#> ```{r echo = FALSE}
#> cc_list$Linear
#> ```
#> 
#> # Non Linear
#> 
#> ```{r echo = FALSE}
#> cc_list$`Non Linear`
#> ```
#> 
#> # Outlier Vertical
#> 
#> ```{r echo = FALSE}
#> cc_list$`Outlier Vertical`
#> ```
#> 
#> # Outlier Horizontal
#> 
#> ```{r echo = FALSE}
#> cc_list$`Outlier Horizontal`
#> ```
#> 
#> # Data
#> 
#> ```{r echo = FALSE}
#> cc_list$Data
#> ```

Decorators

The first example is simple in part because the ggplot objects both contain the data we want to display and, at the same time, provide the mechanism for presenting them - rendering them in a graph. However, this is not always the case. The objects being stored in the list of computational components may not translate directly to the presentation in a document. In these cases a function is needed that takes the list component and returns an object to be displayed. For example, suppose that, along with showing graphs from the Anscombe Quartet, we would like to include the data themselves. We could add the data to the computational_components list and then create the document with:

#> 
#> ```{r echo = FALSE}
#> library(ggplot2)
#> 
#> cc_list <- readRDS("comp-comp.rds")
#> ```
#> 
#> # Linear
#> 
#> ```{r echo = FALSE}
#> cc_list$Linear
#> ```
#> 
#> # Non Linear
#> 
#> ```{r echo = FALSE}
#> cc_list$`Non Linear`
#> ```
#> 
#> # Outlier Vertical
#> 
#> ```{r echo = FALSE}
#> cc_list$`Outlier Vertical`
#> ```
#> 
#> # Outlier Horizontal
#> 
#> ```{r echo = FALSE}
#> cc_list$`Outlier Horizontal`
#> ```
#> 
#> # Data
#> 
#> ```{r echo = FALSE}
#> cc_list$Data
#> ```

computational_components$Data <- anscombe
saveRDS(computational_components, "comp-comp.rds")
cat(paste(ld_make_chunks(ld), collapse = "\n"))

In this case, the {listdown} package will show the entire data set as is the default specified. However, suppose we do not want to show the entire data set in the document. This is common, especially when the data set is large and requires too much vertical space in the outputted document resulting in too much or irrelevant data being shown. Instead, we would like to output to an html document where the data is shown in a datatable thereby controlling the amount of real-estate needed to present the data and, at the same time, providing the user with interactivity to sort and search the data set.

In {listdown} a function or method that implements the presentation of a computational component is referred to as a {} since if follows the decorator pattern described in the classic software engineering text “Design Patterns” by Gamma et al. A decorator takes the element that will be presented as an argument and returns an object for presentation in the output directory. A decorator is specified using the decorator parameter of the listdown() function using a named list where the name corresponds to the type and the element correspond to the function or method that will decorate an object of that type. For example, the anscombe data set can be decorated with the DT::datatable() function as follow. It should be noted that the DT library both loaded in the following code and specified as a package options. This allows ld_make_chunks() to verify the existence of decorators before generating the chunks.

library(DT)
ld <- listdown(load_cc_expr = readRDS("comp-comp.rds"), 
               package = c("ggplot2", "DT"),
               decorator = list(data.frame = datatable))
cat(paste(ld_make_chunks(ld), collapse = "\n"))
#> 
#> ```{r}
#> library(ggplot2)
#> library(DT)
#> 
#> cc_list <- readRDS("comp-comp.rds")
#> ```
#> 
#> # Linear
#> 
#> ```{r}
#> cc_list$Linear
#> ```
#> 
#> # Non Linear
#> 
#> ```{r}
#> cc_list$`Non Linear`
#> ```
#> 
#> # Outlier Vertical
#> 
#> ```{r}
#> cc_list$`Outlier Vertical`
#> ```
#> 
#> # Outlier Horizontal
#> 
#> ```{r}
#> cc_list$`Outlier Horizontal`
#> ```
#> 
#> # Data
#> 
#> ```{r}
#> datatable(cc_list$Data)
#> ```

List names in the decorator argument provide a key to which a function or method is mapped. The underlying decorator resolution is implemented for a given computational component by going through decorator names sequentially to see if the component inherits from the name using the inherits() function. The function or method is selected from the corresponding name which the element first inherits from. This means that when customizing the presentation of objects that inherit from a common class, the more abstract classes should appear at the end of the list. This will ensure that specialized classes will be encountered first in the resolution process.

A separate argument, default_decorator, allows the user to specify the default decorator for an object whose type does not appear in the decorator list. This allows the user to specify any class name for the decorator and avoids a potential type name collision with a default decorator whose name is determined by convention. By default, this argument is set to identity but it can be use to not display a computational component by default if the argument is set to NULL.

The listdown Package

Background and Concepts

Basic Usage and Commentary

Creating a Document with listdown

Decorators