Frame Tools

John Mount, Win-Vector LLC

2022-01-26

wrapr supplies a few tools for creating example data.frames. An important use case is: building the control table for cdata::rowrecs_to_blocks() and cdata::blocks_to_rowrecs() (example here).

Lets see how to create an example data frame. The idea is similar to that found in tibble::tribble(): for small tables a row oriented constructor can be quite legible, and avoids the cognitive load of taking a transpose.

For example we can create a typical data.frame as follows:

d <- data.frame(
  names = c("a", "b", "c", "d"),
  x =     c(1,   2,   3,   4  ),
  y =     c(1,   4,   9,   16 ),
  stringsAsFactors = FALSE)

print(d)

##   names x  y
## 1     a 1  1
## 2     b 2  4
## 3     c 3  9
## 4     d 4 16

Notice how the table is specified by columns (which is close to how data.frames are implemented), but printed by rows. utils::str() and tibble::glimpse() both print by columns.

str(d)

## 'data.frame':    4 obs. of  3 variables:
##  $ names: chr  "a" "b" "c" "d"
##  $ x    : num  1 2 3 4
##  $ y    : num  1 4 9 16

wrapr supplies the method draw_frame which at first glance appears to be a mere pretty-printer:

library("wrapr")

cat(draw_frame(d))

d <- wrapr::build_frame(
   "names"  , "x", "y" |
     "a"    , 1  , 1   |
     "b"    , 2  , 4   |
     "c"    , 3  , 9   |
     "d"    , 4  , 16  )

However, the above rendering is actually executable R code. If we run it, we re-create the original data.frame().

d2 <- build_frame(
   "names", "x", "y" |
   "a"    , 1  ,  1  |
   "b"    , 2  ,  4  |
   "c"    , 3  ,  9  |
   "d"    , 4  , 16  )

print(d2)

##   names x  y
## 1     a 1  1
## 2     b 2  4
## 3     c 3  9
## 4     d 4 16

The merit is: the above input is how it looks when printed.

The technique is intended for typing small examples (or cdata control tables) and only builds data.frames with atomic types (characters, numerics, and logicals; no times, factors or list columns). The specification rule is the first appearance of an infix 2-argument function call (in this case the infix “or symbol” “|”) is taken to mean the earlier arguments are part of the header or column names and later arguments are values. The other appearances of “/” are ignored. This means we could also write the frame as follows:

build_frame(
   "names", "x", "y" |
   "a"    , 1  ,  1  ,
   "b"    , 2  ,  4  ,
   "c"    , 3  ,  9  ,
   "d"    , 4  , 16  )

##   names x  y
## 1     a 1  1
## 2     b 2  4
## 3     c 3  9
## 4     d 4 16

This is more limited than base::dump(), but also more legible.

cat(dump("d", ""))

d <-
structure(list(names = c("a", "b", "c", "d"), x = c(1, 2, 3, 
4), y = c(1, 4, 9, 16)), class = "data.frame", row.names = c(NA, 
-4L))
d

One can use the combination of build_frame() and draw_frame() to neaten up by-hand examples for later use (via copy and paste):

cat(draw_frame(build_frame(
 "names", "x", "y" |
  "a", 1,  1,
  "b", 2,  4,
  "c", 3,  9,
  "d", 4, 16)))

wrapr::build_frame(
   "names"  , "x", "y" |
     "a"    , 1  , 1   |
     "b"    , 2  , 4   |
     "c"    , 3  , 9   |
     "d"    , 4  , 16  )

build_frame() allows for simple substitutions of values. In contrast the method qchar_frame() builds data.frames containing only character types and doesn’t require quoting (though it does allow it).

qchar_frame(
  col_1, col_2, col_3 |
  a    , b    , c     |
  d    , e    , "f g" )

##   col_1 col_2 col_3
## 1     a     b     c
## 2     d     e   f g

build_frame() is intended to capture typed-in examples, and is only compatible with very limited in-place calculation and substitution, and that must be in parenthesis:

build_frame(
   "names", "x"     , "y" |
   "a"    , 1       ,  1  |
   "b"    , cos(2)  ,  4  |
   "c"    , (3+2)   ,  9  |
   "d"    , 4       , 16  )

##   names          x  y
## 1     a  1.0000000  1
## 2     b -0.4161468  4
## 3     c  5.0000000  9
## 4     d  4.0000000 16

Expressions not in parenthesis (such as “3 + 2”) will confuse the language transform build_frame() uses to detect cell boundaries.