Guide No: 51
Paul Johnson, CRMDA <pauljohn@ku.edu>
Zack Roman, CRMDA <zroman@ku.edu>
Please visit https://pj.freefaculty.org/guides/
Keywords: guides, rmarkdown, rmd2html
October 08 2021
Abstract
This guide describes several key features/functionalities of R Markdown for producing colorful and vivid HTML documents.
This is about preparing Rmarkdown documents that exploit the special features available in Web pages. It is a work in progress.
The stationery
package includes a vignette that introduces the markdown
philosophy and the Rmarkdown
version of it. It shows how to use R (R Core Team 2018) code chunks. This document is focused on the special features that might be obtained with HTML documents.
The stationery
package includes a vignette stationery
that explains the process of compiling the document. The document can be compiled either by starting R and using the stationery
function named rmd2html
or it can be compiled by the command line using the shell script rmd2html.sh
that we provide with the package.
The rendered output is an HTML file that can be opened using any browser. The HTML document has figures and cascading style sheets embedded in it, so it is nearly self-contained (relies on MathJax web server and possibly some external javascript).
Rmarkdown intended for an HTML backend can include HTML code. If Rmarkdown is missing syntax to achieve some purpose, then the HTML approach will generally get the job done.
Because many Rmarkdown authors are unfamiliar with HTML code, quite a few syntactic-shortcuts have been developed. As we explained in the Rmarkdown vignette, it is preferable to use the Rmarkdown syntax when it is available because this improves the portability of the document. However, when no markdown syntax exists, one must improvise.
In this section, we first emphasize 2 special features that are provided in our cascading style sheet that facilitate use of some pleasant HTML markup strategies. These are 1) colored callouts and 2) tabbed subsections.
The stylesheet includes style code for “callout” sections. These were adapted from the HTML stylesheets in the bootstrap project.
A colored callout must begin as a level-4 markdown heading. The syntax begins with ####
, and then after that some syntax that is, actually, HTML style code, is included. The colors for which we have provided are “gray”, “red”, “orange”, “blue”, and “green”.
The gray callout is created by this Rmarkdown code:
```
#### Gray Callout {.bs-callout .bs-callout-gray}
```
Perhaps “gray” is for wisdom. Perhaps it is just a visual separator between exciting colors like red and blue!
Syntax:
```
#### Red Callout {.bs-callout .bs-callout-red}
```
Red callout is for danger, in the eyes of some authors. Other authors just think it is pretty.
Orange might be used for examples.
```
#### Orange Callout {.bs-callout .bs-callout-orange}
```
```
#### Blue Callout {.bs-callout .bs-callout-blue}
```
Blue is for correct answers, at least according to the color Nazis.
```
#### Green Callout {.bs-callout .bs-callout-green}
```
Green is the color of the Earth, of course, so we use it for ideas, suggestions, or whatever we like.
At one time, we were calling naming these things by their purpose rather that colors. The purpose <==> color mapping was
purpose | color |
---|---|
info | blue |
warning | orange |
danger | red |
However, we concluded that some people might like to use red for warnings or orange for danger. We are all about diversity and concluded it was superficial to use purpose-based names. Some of us use the colored callout regions simply for decoration, so we don’t name them by purpose anymore.
Some of our older Rmarkdown documents do use that approach, however.
This is an R code chunk embedded inside the red: colored callout:
data.frame(x=rnorm(1000), y=rpois(1000, l=7))
dat <-summary(dat)
x y
Min. :-3.128437 Min. : 0.000
1st Qu.:-0.647860 1st Qu.: 5.000
Median : 0.054540 Median : 7.000
Mean : 0.005542 Mean : 6.809
3rd Qu.: 0.691346 3rd Qu.: 8.000
Max. : 2.786802 Max. :17.000
hist(dat$x, xlab = "Monkey Weight (deviations)", main = "Histogram", prob = TRUE, ylim = c(0, 1))
Note that the colored tabs, which were level 4 headings, are terminated when the next heading is declared at level 2.
This is the only feature that truly differentiates the HTML backend from PDF. The user can “interact” with the tabs. The major benefit is that a section in which there are, say 5, large subsections, can be made to seem shorter by “hiding” the subsections under the tabs.
In our style sheet, tabs are created in two steps. First, a level two markdown header (##
) is introduced with the flag {.tabset .tabset-fade}
. The tabs within that group are created by level 3 headers (###
). To close down the tabbed section, it is necessary to introduce a new level 1 or 2 header.
Please note it is VERY IMPORTANT to include a blank line before a new tabbed section begins. If the line is omitted, then the new section will not be created properly.
As demonstrated by this paragraph, commentary before the level-3 tabbed headers is allowed. In fact, one can introduce any number of paragraphs before the first level 3 header is inserted to begin the tabbed subsections.
Items about our fine state
Items about another fine state, which is not quite as good as Kansas
My baby daughter exclaimed “New York stinks!” in 1990. Last time I was there, it was still correct to say that.
If you could retire as a rich person, this might be the right place to go.
The “hidden” subsections are labeled, but not vividly, and our CSS is to blame. Or the CSS inherited from others is inadequate. Also we need to more easily color and dramatize these tabs. As discussed next, some raw HTML markup is needed to obtain colors.
The only way (that we know of) to get colors is to wrap the tab headers in a <span style>
as shown below. This might be useful to draw attention to the tabs. Blue is the default color.
Note that it is necessary to declare the level-2 header again, to start a new tabset:
## A level-2 heading launches a new tabset, with color via HTML markup {.tabset .tabset-fade}
Followed by the tab captions, which are inside level-3 headers, including color markup:
### <span style="color:orange">An orange tab</span>
Here is the working example:
Commentary about red stuff. We have embedded a red callout box here to have some pizzaz. Click “An orange tab” where we’ve hidded some R output.
Lets try some R code within this tabbed level 3 section:
data.frame(x=rgamma(1000, 1.4))
dat <-hist(dat$x)
words here!
Pictures or graphics can be inserted into Rmarkdown documents. The usual markdown syntax for image inserts is
![alt text](image/location/file.png "Image Title Text")
That syntax is somewhat limiting, mostly because we cannot resize the images. Another limitation is that some graphics formats are not allowed. The suggested file formats are svg, png, and jpg, so graphics in pdf will not be usable as is.
To resize images, we need to resort to raw HTML code, which seems somewhat disappointing to many authors. HTML allows rescaling. We can specify both the width and the height of the image. In this example code, a png format file named “plot1.png” is inserted in the document.
<img src="ext_img/plot1.png" alt = "Floating .png"
width = "308"
height = "216">
Authors who need to use graphics saved in other formats will need to convert to png, jpg, or svg. The Gold standard of format converters is the convert
function of the ImageMagick suite of tools. It is also possible to open a PDF in some editors, such as the GNU Image Manipulation program (GIMP), and save as an image format. There are some websites that might be useful for this purpose, such as https://pdf2png.com.
If you can figure out how to insert characters with accents, they will display correctly. For example, Karl Gustav Jöreskog, Dag Sörbom, and Linda Muthén and Bengt Muthén. These are entered at the keyboard using editor-specific tools.
In the stationery
package vignette named code_chunks
, we explain the idea that in both and Rmarkdown, one can insert R code chunks that will be processed. There, we spell out a list of requirements for any chunk based system along with examples.
We run the same code chunks here, to compare the HTML output with PDF from the code_chunks
vignette.
A chunk that is evaluated, echoed, both input and output. This is a standard chunk, no chunk options are used:
The user will see both the input code and the output, each in a separate box:
set.seed(234234)
rnorm(100)
x <-mean(x)
[1] -0.1004232
Notice the code highlighting is not entirely successful, and is different from results we see in other backends.
A chunk with commands that are echoed into the document, but not evaluated (eval=F
).
When the document is compiled, the reader will see the depiction of the code, which is (by default) beautified and reformatted:
set.seed(234234)
rnorm(100)
x <-mean(x)
A chunk that is evaluated, with output displayed, but code is not echoed (echo=F
). It is not necessary to specify eval=T
because that is a default.
The user will not see any code that runs, but only a result:
[1] 0.2024592
A hidden code chunk. A chunk that is evaluated, but neither is the input nor output displayed (include=F
)
What is the grammatically correct way to say “did you see nothing?” You should not even see an empty box? After that, the object x
exists in the on-going R session, it can be put to use.
A chunk that creates a graph, and allows it to be inserted into the document, but the code is not echoed for the reader to see.
Save a graph in a file and display it at a later point.
This can be acheived by specifing: fig.show=“hold”, echo=F. Optionally we can specify the height and width of the figure with fig.height and fig.width (which are always in inches). The file will be saved in the current working directory.
hist(x, main = "Another Histogram")
A chunk that shows a series of plotting commands.
This is a named chunk that is not evaluated, but it is displayed to reader. The same code is then put to use twice in what follows.
par(mar = c(3,2,0.5,0.5))
0.7 ## cex.axis
cax <-plot(c(0, 1), c(0, 1), xlim = c(0,1), ylim = c(0,1), type = "n", ann = FALSE, axes = FALSE)
rect(0, 0, 1, 1, col = "light grey", border = "grey")
axis(1, tck = 0.01, pos = 0, cex.axis = cax, padj = -2.8, lwd = 0.3,
at = seq(0, 1, by = 0.2), labels = c("", seq(0.2,0.8, by=0.2), ""))
axis(2, tck = 0.01, pos = 0, cex.axis = cax, padj = 2.8, lwd = 0.3,
at = seq(0, 1, by = 0.2), labels = c("", seq(0.2,0.8, by=0.2), ""))
mtext(expression(x), side = 1, line = 0.5, at = .5, cex = cax)
mtext(expression(y), side = 2, line = 0.5, at = .5, cex = cax)
mtext(c("Min x", "Max x"), side = 1, line = -0.5, at = c(0.05, 0.95), cex = cax)
mtext(c("Min y", "Max y"), side = 2, line = -0.5, at = c(0.05, 0.95), cex = cax)
lines(c(.6, .6, 0), c(0, .6, .6), lty = "dashed")
text(.6, .6, expression(paste("The location ",
group("(",list(x[i] == .6, y[i] == .6),")"))), pos = 3, cex = cax + 0.1)
points(.6, .6, pch = 16)
The first re-use of this code simply runs the whole chunk, and keeps the final figure. This figure is a png file that is embedded in the HTML document.
A special feature of knitr is the ability to keep the intermediate plots that are produced by each line. An inspection of the tmpout
directory shows that this code created several graphs. Observe there are several files:
list.files("tmpout", pattern="p-chunk76.*png")
[1] "p-chunk76-1.png" "p-chunk76-10.png" "p-chunk76-11.png"
[4] "p-chunk76-2.png" "p-chunk76-3.png" "p-chunk76-4.png"
[7] "p-chunk76-5.png" "p-chunk76-6.png" "p-chunk76-7.png"
[10] "p-chunk76-8.png" "p-chunk76-9.png"
In a way that is rather similar to the PDF backend, we use a backend-specific table structure to display four of the images. The display of the table’s caption is controlled by the style sheet.
<table border="0" cellpadding="0">
<caption>Figure: Table Array of Four Graphics</caption>
<tr><td><img src="tmpout/p-chunk76-4.png" height=350 width=350 alt = "a png"></td>
<td><img src="tmpout/p-chunk76-8.png" height=350 width=350 alt = "b png"></td></tr>
<tr><td><img src="tmpout/p-chunk76-9.png" height=350 width=350 alt = "c png"></td>
<td><img src="tmpout/p-chunk76-11.png" height=350 width=350 alt = "d png"> </td></tr>
</table>
Markdown includes a rather crude table-making syntax. We used it above in to display the purpose to color relationship of the colored callout boxes. For most serious analysis, that type of table will not be sufficient.
The chunk option results="asis"
is used to display HTML markup that can be created by R functions. The cascading style sheet will play an important role in the final display. If we are unhappy with the rendering of the tables, we should concentrate on fixing the CSS, rather than finger-painting borders and such. While working on this project we discovered a flaw in the pandoc processing engine that caused tables to fail. If the HTML generated by the chunk includes spaces, pandoc
can be fooled into thinking the text is markdown rather than HTML.
There are many packages that can create near-publication-quality regression tables. Here is an example of the rockchalk package can create HTML code for a regression table. In this case, we run the code chunk to generate the HTML code, and then we have to manually purge the extra spaces in the output so that pandoc
will not corrupt the output.
cat(or1)
<td colspan = ‘1’; align = ‘left’>Amod | Bmod | Gmod | |
Estimate | Estimate | Estimate | |
(S.E.) | (S.E.) | (S.E.) | |
(Intercept) | 30.245*** | 29.774*** | 30.013*** |
(0.618) | (0.522) | (0.490) | |
x1 | 1.546* | _ | 2.217*** |
(0.692) | (0.555) | ||
x2 | _ | 3.413*** | 3.717*** |
(0.512) | (0.483) | ||
N | 100 | 100 | 100 |
RMSE | 6.121 | 5.205 | 4.849 |
R2 | 0.048 | 0.312 | 0.409 |
adj R2 | 0.039 | 0.305 | 0.397 |
|
The package pander
offers flexability and functionality. It can display an R table, the coefficient object generated by a regression summary
library(pander)
summary(m1)
sum <-pander(sum$coefficients)
Estimate | Std. Error | t value | Pr(>|t|) | |
---|---|---|---|---|
(Intercept) | 30.25 | 0.6176 | 48.97 | 1.042e-70 |
x1 | 1.546 | 0.6924 | 2.232 | 0.02789 |
and it can also display a matrix created by the package psyc
library(psych)
pander(describe(dat))
vars | n | mean | sd | median | trimmed | mad | min | |
---|---|---|---|---|---|---|---|---|
x1 | 1 | 100 | -0.1192 | 0.8884 | -0.02763 | -0.09578 | 0.9889 | -2.614 |
x2 | 2 | 100 | 0.0841 | 1.021 | 0.1955 | 0.1211 | 0.9755 | -2.562 |
y1 | 3 | 100 | 30.06 | 6.243 | 30.65 | 30.06 | 6.384 | 16.59 |
y2 | 4 | 100 | 0.2453 | 5.16 | 0.7159 | 0.3888 | 4.532 | -13.2 |
max | range | skew | kurtosis | se | |
---|---|---|---|---|---|
x1 | 1.912 | 4.526 | -0.2958 | -0.1176 | 0.08884 |
x2 | 2.233 | 4.794 | -0.3634 | -0.2852 | 0.1021 |
y1 | 48.44 | 31.85 | 0.03705 | -0.2777 | 0.6243 |
y2 | 11.32 | 24.52 | -0.2685 | -0.2653 | 0.516 |
Some document elements that are available in PDF output are missing in Rmarkdown to HTML conversion. The most serious missing pieces are numbered and labeled “floating” tables, figures, and equations. These losses seem nearly fatal for the HTML backend and are a strong reason why one should prefer PDF.
Nevertheless, for Web pages, some authors truly prefer HTML output (maybe because they like colored callouts and tabbed sections). As a result, we have some work arounds for these problems.
In “display equation” mathematics, we want to insert numbered equations and then refer to them. Unfortunately, Rmarkdown to HTML does not support auto-numbering equations. However, one can number equations manually by adding\tag{}
to the end of equations. For example,
\[ + - = \approx \ne \ge \lt \pm\tag{1} \]
\[ \pi \approx 3.1415927\tag{2} \]
\[ a_i \ge 0~~~\forall i\tag{3} \]
\[ x \lt 15\tag{4} \]
Unfortunately, when new equations are inserted, it will be necesssary to manually renumber these. In addition, there is no HTML backend method to then refer to equation (3) without explicitly typing in the equation number.
HTML does offer its own form of cross referencing by hyperlink anchors, however. Suppose we want the reader to be able to click a link that goes to a figure that we have presented previously. We go that that figure and insert HTML code along these lines:
<a name="specialfig"></a>
When we want to write something like click here to see the special figure", the HTML markup is
<a href="#specialfig">click here to see the special figure</a>
Use these callouts to attract attention.
Blank rows separate paragraphs.
The character width of rows should be 80 or less. I have no idea how anybody thinks they have a right to impose an infinitely long row, but it’s bad. Edit the document with Emacs, run M-q to get re-positioned text. If your editor cannot do that, quit using it.
Must make sure compiling using the kutils.css located in the stationery package. For example, stationery::rmd2html("filename.Rmd")
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 21.04
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] psych_2.1.9 pander_0.6.4 rockchalk_1.8.144
[4] stationery_1.0
loaded via a namespace (and not attached):
[1] zip_2.2.0 Rcpp_1.0.7 compiler_4.1.1 nloptr_1.2.2.2
[5] jquerylib_0.1.4 plyr_1.8.6 highr_0.9 tools_4.1.1
[9] boot_1.3-28 digest_0.6.28 lme4_1.1-27.1 evaluate_0.14
[13] nlme_3.1-153 lattice_0.20-45 rlang_0.4.11 openxlsx_4.2.4
[17] Matrix_1.3-4 yaml_2.2.1 parallel_4.1.1 xfun_0.26
[21] fastmap_1.1.0 stringr_1.4.0 knitr_1.36 grid_4.1.1
[25] foreign_0.8-81 rmarkdown_2.11 carData_3.0-4 minqa_1.2.4
[29] magrittr_2.0.1 htmltools_0.5.2 MASS_7.3-54 splines_4.1.1
[33] kutils_1.70 mnormt_2.0.2 xtable_1.8-4 stringi_1.7.5
[37] tmvnsim_1.0-2
Available under Created Commons license 3.0
R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing: R Foundation for Statistical Computing. https://cran.r-project.org/doc/manuals/r-release/fullrefman.pdf.