Introduction to listcompr

Patrick Roocks

2021-10-02

The listcompr package is a light-weight collection of functions for list comprehension. It is intended as “syntactic sugar” for R and it is inspired by the list comprehension capabilities from ‘python’. Next to lists, similar structures like vectors (of numeric or character type), data frames, or named lists can be easily composed. The package may be used for the simple generation of small data sets for “textbook examples”, for unit tests of your R code, or for tiny mathematical tasks.

The package is not intended for creating large data sets. The evaluation is done row-wise (and not vector-wise), which makes the evaluation relatively slow. On the other hand, it offers more flexibility to formulate “mathematical” statements. Especially for users not used to the vector-wise evaluation it should be easy to use.

Introductory examples

In this section we present the basic features of listcompr in some simple examples.

A simple example generating a vector

Assume we want to create a vector of all numbers in 1:10 which can be divided by 3 or by 4. We use the function gen.vector taking the number i as first argument. The range for i and the given condition are passed to the function in the (arbitrary many) following arguments:

gen.vector(i, i = 1:10, i %% 3 == 0 || i %% 4 == 0)
## [1] 3 4 6 8 9

Note that it doesn’t matter if we use || or | as operator. All the conditions are evaluated line by line and not vector-wise.

Additional conditions are implicitly and-connected, i.e., if we want to explicitly exclude the number 8 we can state:

gen.vector(i, i = 1:10, i %% 3 == 0 || i %% 4 == 0, i != 8)
## [1] 3 4 6 9

Of course, we also could also unify that to one condition via (... || ...) && i != 8.

Simple examples of lists and data frames with two variables

Assume we want to create a list of all tuples (i, j) where i and j are ranging in 1:3 and i >= j must hold. To this end we use the gen.list function taking the base expression c(i,j) as first argument and putting the ranges and conditions for i and j in the following arguments. This list of vectors can be expressed as:

gen.list(c(i, j), i = 1:3, j = 1:3, i <= j)
## [[1]]
## [1] 1 1
## 
## [[2]]
## [1] 1 2
## 
## [[3]]
## [1] 2 2
## 
## [[4]]
## [1] 1 3
## 
## [[5]]
## [1] 2 3
## 
## [[6]]
## [1] 3 3

It’s also allowed to use the current value of i within the range of j. We can omit the condition i <= j by changing the range for j to j = i:3. Moreover we now use the similar function gen.data.frame which puts each generated vector in a row of a data frame:

df <- gen.data.frame(c(i, j), i = 1:3, j = i:3)
knitr::kable(df)
i j
1 1
1 2
2 2
1 3
2 3
3 3

A real world example

Assume you have an arbitrary amount quadratic tiles and want to arrange them on your terrace under a certain condition: the number of tiles at the border should be the same as the number of tiles in the inner (i.e., all tiles without the border). For instance, a terrace with \(6 \times 4\) tiles has \((6-2)\cdot(4-2) = 8\) tiles at the inner and \((6\cdot4)-8 = 14\) tiles in the inner, i.e., does not fulfill the condition. The maximum size of your terrace is \(20 \times 20\) tiles (e.g., size of your garden). We get all possible sizes of a terrace with that condition via:

df <- gen.data.frame(c(x, y), x = 1:20, y = 1:20, (x-2)*(y-2) == x*y/2)
knitr::kable(df)
x y
12 5
8 6
6 8
5 12

The outer size of the terrace with that condition is either \(6 \times 8\) or \(5 \times 12\) tiles (and the symmetric solutions). Interestingly, it’s easy to prove that these are the only solutions even if your garden and the amount of tiles is infinite.

Now let’s prettify the output. Let’s rename the “x” and “y” columns to “width” and “height”. We also add a column for the number of tiles in the inner (which equals the number of tiles at the border).

df <- gen.data.frame(c(width = x, height = y, inner_tiles), x = 1:20, y = 1:20, 
                     inner_tiles = (x-2)*(y-2), inner_tiles == x*y/2)
knitr::kable(df)
width height inner_tiles
12 5 30
8 6 24
6 8 24
5 12 30

Advanced features

Now we present some advanced features, allowing to create a bit more complex data sets.

Wildcard ranges

For any variable with the name pattern {varname}_{num} (underscore and a number), e.g. a_2, the range for all these indexed variables can be defined by {varname}_.

Assume we want to generate a data frame with all tuples \((a_1, a_2, a_3)\) with \(a_1 < a_2 < a_3\) where \(a_i \in \{1, ..., 4\}\). We simple specify the range for the a_{i} variables by a_ = 1:4:

df <- gen.data.frame(c(a_1, a_2, a_3), a_ = 1:4, a_1 < a_2, a_2 < a_3)
knitr::kable(df)
a_1 a_2 a_3
1 2 3
1 2 4
1 3 4
2 3 4

Expanded expressions

Now assume we want to get all tuples \((a_1, ..., a_5) \in \mathbb{N}^5\) with the condition \(\sum_{i} a_i = 6\). We make use of the <operator> ... <operator> notation of listcompr which expands expressions in an intuitive way:

df <- gen.data.frame(c(a_1, ..., a_5), a_ = 1:5, a_1 + ... + a_5 == 6)
knitr::kable(df)
a_1 a_2 a_3 a_4 a_5
2 1 1 1 1
1 2 1 1 1
1 1 2 1 1
1 1 1 2 1
1 1 1 1 2

The expression expansions works also for arguments of functions, i.e., we could also replace the condition by sum(a_1, ..., a_5) == 6 (cf. next example).

Generated conditions

Let’s consider a similar example to the above one. We want to generate all tuples \((a_1, ..., a_5) \in \{1, ..., 5\}^5\) with \(\sum_{i} a_i = 10\) and \(a_i \leq a_{i+1}\). To avoid writing a_1 < a_2, a_2 < a_3, ... there is the function gen.logical.and which is a list comprehension helper to generate conditions. We can express this data frame by:

df <- gen.data.frame(c(a_1, ..., a_5), a_ = 1:5, sum(a_1, ..., a_5) == 10, 
                     gen.logical.and(a_i <= a_(i+1), i = 1:4))
knitr::kable(df)
a_1 a_2 a_3 a_4 a_5
2 2 2 2 2
1 2 2 2 3
1 1 2 3 3
1 1 2 2 4
1 1 1 3 4
1 1 1 2 5

Example: calculate all permutations

We can get all permutations of \((1, 2, 3, 4)\) (which are \(4! = 24\)) by using the following property of permutations: \(\text{perm}(1, ..., n) = \{(a_1, ..., a_n) | a_i \in \{1, ..., n\}, a_i \neq a_j \; \text{for} \; i \neq j\}\). This can be expressed via (we show only the first 8 permutations):

df <- gen.data.frame(c(a_1, ..., a_4), a_ = 1:4, 
                     gen.logical.and(a_i != a_j, i = 1:4, j = (i+1):4))
knitr::kable(df[1:8,])
a_1 a_2 a_3 a_4
4 3 2 1
3 4 2 1
4 2 3 1
2 4 3 1
3 2 4 1
2 3 4 1
4 3 1 2
3 4 1 2

Note: Don’t use such an approach for creating large data sets of permutations, e.g., for all the 5040 permutations of 1:7. This approach is very slow! We recommend the permn function from the combinat package.

Using generated conditions within dplyr

Conditions created with gen.logical.and or gen.logical.or can be used in other R packages, e.g., dplyr. To use a generated condition in, e.g., dplyr::filter you need to put the operator !! in front of the function which generates the condition.

For example, assume you have a data frame with the result of throwing three dices. You want to get the outcomes where at least two dices show the number 6. We show only the first 8 results:

dices <- gen.data.frame(c(a_1, ..., a_3), a_ = 1:6)
res <- dplyr::filter(dices, !!gen.logical.or(a_i == 6 & a_j == 6, i = 1:3, j = (i+1):3))
knitr::kable(res[1:8,])
a_1 a_2 a_3
6 6 1
6 6 2
6 6 3
6 6 4
6 6 5
6 1 6
6 2 6
6 3 6

Nested list and vector comprehensions

The list comprehension functions of listcompr can also be nested. In the following example we use gen.data.frame as the outer function and gen.vector (within a sum) to generate a data frame of the sum of all divisors of a whole number (the number itself excluded):

df <- gen.data.frame(c(a, sumdiv = sum(gen.vector(x, x = 1:(a-1), a %% x == 0))), a = 2:10)
knitr::kable(df)
a sumdiv
2 1
3 1
4 3
5 1
6 6
7 1
8 7
9 4
10 8

Now we want to add a flag to this data frame if the number a is a so called perfect number. A perfect number is characterized by being identical to their sum of divisors. To this end we use a substitution variable sumdiv for the sum of divisors. We use this variable for checking if the number is perfect:

df <- gen.data.frame(c(a, sumdiv, perfect = (sumdiv == a)), a = 2:10, 
                     sumdiv = sum(gen.vector(x, x = 1:(a-1), a %% x == 0)))
knitr::kable(df)
a sumdiv perfect
2 1 0
3 1 0
4 3 0
5 1 0
6 6 1
7 1 0
8 7 0
9 4 0
10 8 0

This means the only perfect number between 1 and 10 is 6.

Character compositions

The package also offers some functions to compose characters. They can be used to generate lists or vectors of characters or (row-)names of the results.

An example for a vector of characters

Consider our example from above where we search for the number of tiles, such that the border and the inner have the same number of tiles. Assume we want a textual output instead of a data frame. We simply pass a character to gen.vector where all expressions in {}-brackets are evaluated according to the list comprehension result:

gen.vector("size: {x}x{y} tiles, where {x*y/2} tiles are at the border/inner",
           x = 1:20, y = 1:20, (x-2)*(y-2) == x*y/2)
## [1] "size: 12x5 tiles, where 30 tiles are at the border/inner"
## [2] "size: 8x6 tiles, where 24 tiles are at the border/inner" 
## [3] "size: 6x8 tiles, where 24 tiles are at the border/inner" 
## [4] "size: 5x12 tiles, where 30 tiles are at the border/inner"

A named list

Let’s revisit the example with the divisors. Instead of the sum of the divisors, we want to get all divisors of a number as a vector. Each vector should be stored in a list entry named divisors_of_{a}. With the following we get the divisors for all numbers in 5:10:

gen.named.list("divisors_of_{a}", gen.vector(x, x = 1:(a-1), a %% x == 0), a = 5:10)
## $divisors_of_5
## [1] 1
## 
## $divisors_of_6
## [1] 1 2 3
## 
## $divisors_of_7
## [1] 1
## 
## $divisors_of_8
## [1] 1 2 4
## 
## $divisors_of_9
## [1] 1 3
## 
## $divisors_of_10
## [1] 1 2 5

Named matrices

We continue with calculating divisors. Assume, the numbers a = 90:95 are given and we want to show a table indicating which numbers within 2:10 are integer divisors of a. The following logical matrix shows this result. We use byrow = TRUE to specify that the elements of the inner vector refers to the columns and not to the rows.

m <- gen.named.matrix("divisors_{a}", gen.named.vector("{x}", a %% x == 0, x = 2:10), 
                      a = 90:95, byrow = TRUE)
knitr::kable(m)
divisors_90 divisors_91 divisors_92 divisors_93 divisors_94 divisors_95
2 TRUE FALSE TRUE FALSE TRUE FALSE
3 TRUE FALSE FALSE TRUE FALSE FALSE
4 FALSE FALSE TRUE FALSE FALSE FALSE
5 TRUE FALSE FALSE FALSE FALSE TRUE
6 TRUE FALSE FALSE FALSE FALSE FALSE
7 FALSE TRUE FALSE FALSE FALSE FALSE
8 FALSE FALSE FALSE FALSE FALSE FALSE
9 TRUE FALSE FALSE FALSE FALSE FALSE
10 TRUE FALSE FALSE FALSE FALSE FALSE

Analogously to the example above we can create a matrix with characters as content, denoting either the integer result x/a or simply "no" if x is not an integer divisor of a.

m <- gen.named.matrix("divisors_{a}",
        gen.named.vector("{x}", if (a %% x == 0) "factor = {a/x}" else "no", x = 2:10), 
        a = 90:95, byrow = TRUE)
knitr::kable(m)
divisors_90 divisors_91 divisors_92 divisors_93 divisors_94 divisors_95
2 factor = 45 no factor = 46 no factor = 47 no
3 factor = 30 no no factor = 31 no no
4 no no factor = 23 no no no
5 factor = 18 no no no no factor = 19
6 factor = 15 no no no no no
7 no factor = 13 no no no no
8 no no no no no no
9 factor = 10 no no no no no
10 factor = 9 no no no no no