The listcompr package is a light-weight collection of functions for list comprehension. It is intended as “syntactic sugar” for R and it is inspired by the list comprehension capabilities from ‘python’. Next to lists, similar structures like vectors (of numeric or character type), data frames, or named lists can be easily composed. The package may be used for the simple generation of small data sets for “textbook examples”, for unit tests of your R code, or for tiny mathematical tasks.
The package is not intended for creating large data sets. The evaluation is done row-wise (and not vector-wise), which makes the evaluation relatively slow. On the other hand, it offers more flexibility to formulate “mathematical” statements. Especially for users not used to the vector-wise evaluation it should be easy to use.
In this section we present the basic features of listcompr in some simple examples.
Assume we want to create a vector of all numbers in 1:10
which can be divided by 3 or by 4. We use the function gen.vector
taking the number i
as first argument. The range for i
and the given condition are passed to the function in the (arbitrary many) following arguments:
gen.vector(i, i = 1:10, i %% 3 == 0 || i %% 4 == 0)
## [1] 3 4 6 8 9
Note that it doesn’t matter if we use ||
or |
as operator. All the conditions are evaluated line by line and not vector-wise.
Additional conditions are implicitly and-connected, i.e., if we want to explicitly exclude the number 8 we can state:
gen.vector(i, i = 1:10, i %% 3 == 0 || i %% 4 == 0, i != 8)
## [1] 3 4 6 9
Of course, we also could also unify that to one condition via (... || ...) && i != 8
.
Assume we want to create a list of all tuples (i, j)
where i
and j
are ranging in 1:3
and i >= j
must hold. To this end we use the gen.list
function taking the base expression c(i,j)
as first argument and putting the ranges and conditions for i
and j
in the following arguments. This list of vectors can be expressed as:
gen.list(c(i, j), i = 1:3, j = 1:3, i <= j)
## [[1]]
## [1] 1 1
##
## [[2]]
## [1] 1 2
##
## [[3]]
## [1] 2 2
##
## [[4]]
## [1] 1 3
##
## [[5]]
## [1] 2 3
##
## [[6]]
## [1] 3 3
It’s also allowed to use the current value of i
within the range of j
. We can omit the condition i <= j
by changing the range for j
to j = i:3
. Moreover we now use the similar function gen.data.frame
which puts each generated vector in a row of a data frame:
<- gen.data.frame(c(i, j), i = 1:3, j = i:3)
df ::kable(df) knitr
i | j |
---|---|
1 | 1 |
1 | 2 |
2 | 2 |
1 | 3 |
2 | 3 |
3 | 3 |
Assume you have an arbitrary amount quadratic tiles and want to arrange them on your terrace under a certain condition: the number of tiles at the border should be the same as the number of tiles in the inner (i.e., all tiles without the border). For instance, a terrace with \(6 \times 4\) tiles has \((6-2)\cdot(4-2) = 8\) tiles at the inner and \((6\cdot4)-8 = 14\) tiles in the inner, i.e., does not fulfill the condition. The maximum size of your terrace is \(20 \times 20\) tiles (e.g., size of your garden). We get all possible sizes of a terrace with that condition via:
<- gen.data.frame(c(x, y), x = 1:20, y = 1:20, (x-2)*(y-2) == x*y/2)
df ::kable(df) knitr
x | y |
---|---|
12 | 5 |
8 | 6 |
6 | 8 |
5 | 12 |
The outer size of the terrace with that condition is either \(6 \times 8\) or \(5 \times 12\) tiles (and the symmetric solutions). Interestingly, it’s easy to prove that these are the only solutions even if your garden and the amount of tiles is infinite.
Now let’s prettify the output. Let’s rename the “x” and “y” columns to “width” and “height”. We also add a column for the number of tiles in the inner (which equals the number of tiles at the border).
<- gen.data.frame(c(width = x, height = y, inner_tiles), x = 1:20, y = 1:20,
df inner_tiles = (x-2)*(y-2), inner_tiles == x*y/2)
::kable(df) knitr
width | height | inner_tiles |
---|---|---|
12 | 5 | 30 |
8 | 6 | 24 |
6 | 8 | 24 |
5 | 12 | 30 |
Now we present some advanced features, allowing to create a bit more complex data sets.
For any variable with the name pattern {varname}_{num}
(underscore and a number), e.g. a_2
, the range for all these indexed variables can be defined by {varname}_
.
Assume we want to generate a data frame with all tuples \((a_1, a_2, a_3)\) with \(a_1 < a_2 < a_3\) where \(a_i \in \{1, ..., 4\}\). We simple specify the range for the a_{i}
variables by a_ = 1:4
:
<- gen.data.frame(c(a_1, a_2, a_3), a_ = 1:4, a_1 < a_2, a_2 < a_3)
df ::kable(df) knitr
a_1 | a_2 | a_3 |
---|---|---|
1 | 2 | 3 |
1 | 2 | 4 |
1 | 3 | 4 |
2 | 3 | 4 |
Now assume we want to get all tuples \((a_1, ..., a_5) \in \mathbb{N}^5\) with the condition \(\sum_{i} a_i = 6\). We make use of the <operator> ... <operator>
notation of listcompr which expands expressions in an intuitive way:
<- gen.data.frame(c(a_1, ..., a_5), a_ = 1:5, a_1 + ... + a_5 == 6)
df ::kable(df) knitr
a_1 | a_2 | a_3 | a_4 | a_5 |
---|---|---|---|---|
2 | 1 | 1 | 1 | 1 |
1 | 2 | 1 | 1 | 1 |
1 | 1 | 2 | 1 | 1 |
1 | 1 | 1 | 2 | 1 |
1 | 1 | 1 | 1 | 2 |
The expression expansions works also for arguments of functions, i.e., we could also replace the condition by sum(a_1, ..., a_5) == 6
(cf. next example).
Let’s consider a similar example to the above one. We want to generate all tuples \((a_1, ..., a_5) \in \{1, ..., 5\}^5\) with \(\sum_{i} a_i = 10\) and \(a_i \leq a_{i+1}\). To avoid writing a_1 < a_2, a_2 < a_3, ...
there is the function gen.logical.and
which is a list comprehension helper to generate conditions. We can express this data frame by:
<- gen.data.frame(c(a_1, ..., a_5), a_ = 1:5, sum(a_1, ..., a_5) == 10,
df gen.logical.and(a_i <= a_(i+1), i = 1:4))
::kable(df) knitr
a_1 | a_2 | a_3 | a_4 | a_5 |
---|---|---|---|---|
2 | 2 | 2 | 2 | 2 |
1 | 2 | 2 | 2 | 3 |
1 | 1 | 2 | 3 | 3 |
1 | 1 | 2 | 2 | 4 |
1 | 1 | 1 | 3 | 4 |
1 | 1 | 1 | 2 | 5 |
We can get all permutations of \((1, 2, 3, 4)\) (which are \(4! = 24\)) by using the following property of permutations: \(\text{perm}(1, ..., n) = \{(a_1, ..., a_n) | a_i \in \{1, ..., n\}, a_i \neq a_j \; \text{for} \; i \neq j\}\). This can be expressed via (we show only the first 8 permutations):
<- gen.data.frame(c(a_1, ..., a_4), a_ = 1:4,
df gen.logical.and(a_i != a_j, i = 1:4, j = (i+1):4))
::kable(df[1:8,]) knitr
a_1 | a_2 | a_3 | a_4 |
---|---|---|---|
4 | 3 | 2 | 1 |
3 | 4 | 2 | 1 |
4 | 2 | 3 | 1 |
2 | 4 | 3 | 1 |
3 | 2 | 4 | 1 |
2 | 3 | 4 | 1 |
4 | 3 | 1 | 2 |
3 | 4 | 1 | 2 |
Note: Don’t use such an approach for creating large data sets of permutations, e.g., for all the 5040 permutations of 1:7
. This approach is very slow! We recommend the permn
function from the combinat package.
Conditions created with gen.logical.and
or gen.logical.or
can be used in other R packages, e.g., dplyr. To use a generated condition in, e.g., dplyr::filter
you need to put the operator !!
in front of the function which generates the condition.
For example, assume you have a data frame with the result of throwing three dices. You want to get the outcomes where at least two dices show the number 6. We show only the first 8 results:
<- gen.data.frame(c(a_1, ..., a_3), a_ = 1:6)
dices <- dplyr::filter(dices, !!gen.logical.or(a_i == 6 & a_j == 6, i = 1:3, j = (i+1):3))
res ::kable(res[1:8,]) knitr
a_1 | a_2 | a_3 |
---|---|---|
6 | 6 | 1 |
6 | 6 | 2 |
6 | 6 | 3 |
6 | 6 | 4 |
6 | 6 | 5 |
6 | 1 | 6 |
6 | 2 | 6 |
6 | 3 | 6 |
The list comprehension functions of listcompr can also be nested. In the following example we use gen.data.frame
as the outer function and gen.vector
(within a sum) to generate a data frame of the sum of all divisors of a whole number (the number itself excluded):
<- gen.data.frame(c(a, sumdiv = sum(gen.vector(x, x = 1:(a-1), a %% x == 0))), a = 2:10)
df ::kable(df) knitr
a | sumdiv |
---|---|
2 | 1 |
3 | 1 |
4 | 3 |
5 | 1 |
6 | 6 |
7 | 1 |
8 | 7 |
9 | 4 |
10 | 8 |
Now we want to add a flag to this data frame if the number a
is a so called perfect number. A perfect number is characterized by being identical to their sum of divisors. To this end we use a substitution variable sumdiv
for the sum of divisors. We use this variable for checking if the number is perfect:
<- gen.data.frame(c(a, sumdiv, perfect = (sumdiv == a)), a = 2:10,
df sumdiv = sum(gen.vector(x, x = 1:(a-1), a %% x == 0)))
::kable(df) knitr
a | sumdiv | perfect |
---|---|---|
2 | 1 | 0 |
3 | 1 | 0 |
4 | 3 | 0 |
5 | 1 | 0 |
6 | 6 | 1 |
7 | 1 | 0 |
8 | 7 | 0 |
9 | 4 | 0 |
10 | 8 | 0 |
This means the only perfect number between 1 and 10 is 6.
The package also offers some functions to compose characters. They can be used to generate lists or vectors of characters or (row-)names of the results.
Consider our example from above where we search for the number of tiles, such that the border and the inner have the same number of tiles. Assume we want a textual output instead of a data frame. We simply pass a character to gen.vector
where all expressions in {}
-brackets are evaluated according to the list comprehension result:
gen.vector("size: {x}x{y} tiles, where {x*y/2} tiles are at the border/inner",
x = 1:20, y = 1:20, (x-2)*(y-2) == x*y/2)
## [1] "size: 12x5 tiles, where 30 tiles are at the border/inner"
## [2] "size: 8x6 tiles, where 24 tiles are at the border/inner"
## [3] "size: 6x8 tiles, where 24 tiles are at the border/inner"
## [4] "size: 5x12 tiles, where 30 tiles are at the border/inner"
Let’s revisit the example with the divisors. Instead of the sum of the divisors, we want to get all divisors of a number as a vector. Each vector should be stored in a list entry named divisors_of_{a}
. With the following we get the divisors for all numbers in 5:10
:
gen.named.list("divisors_of_{a}", gen.vector(x, x = 1:(a-1), a %% x == 0), a = 5:10)
## $divisors_of_5
## [1] 1
##
## $divisors_of_6
## [1] 1 2 3
##
## $divisors_of_7
## [1] 1
##
## $divisors_of_8
## [1] 1 2 4
##
## $divisors_of_9
## [1] 1 3
##
## $divisors_of_10
## [1] 1 2 5
We continue with calculating divisors. Assume, the numbers a = 90:95
are given and we want to show a table indicating which numbers within 2:10
are integer divisors of a
. The following logical matrix shows this result. We use byrow = TRUE
to specify that the elements of the inner vector refers to the columns and not to the rows.
<- gen.named.matrix("divisors_{a}", gen.named.vector("{x}", a %% x == 0, x = 2:10),
m a = 90:95, byrow = TRUE)
::kable(m) knitr
divisors_90 | divisors_91 | divisors_92 | divisors_93 | divisors_94 | divisors_95 | |
---|---|---|---|---|---|---|
2 | TRUE | FALSE | TRUE | FALSE | TRUE | FALSE |
3 | TRUE | FALSE | FALSE | TRUE | FALSE | FALSE |
4 | FALSE | FALSE | TRUE | FALSE | FALSE | FALSE |
5 | TRUE | FALSE | FALSE | FALSE | FALSE | TRUE |
6 | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
7 | FALSE | TRUE | FALSE | FALSE | FALSE | FALSE |
8 | FALSE | FALSE | FALSE | FALSE | FALSE | FALSE |
9 | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
10 | TRUE | FALSE | FALSE | FALSE | FALSE | FALSE |
Analogously to the example above we can create a matrix with characters as content, denoting either the integer result x/a
or simply "no"
if x
is not an integer divisor of a
.
<- gen.named.matrix("divisors_{a}",
m gen.named.vector("{x}", if (a %% x == 0) "factor = {a/x}" else "no", x = 2:10),
a = 90:95, byrow = TRUE)
::kable(m) knitr
divisors_90 | divisors_91 | divisors_92 | divisors_93 | divisors_94 | divisors_95 | |
---|---|---|---|---|---|---|
2 | factor = 45 | no | factor = 46 | no | factor = 47 | no |
3 | factor = 30 | no | no | factor = 31 | no | no |
4 | no | no | factor = 23 | no | no | no |
5 | factor = 18 | no | no | no | no | factor = 19 |
6 | factor = 15 | no | no | no | no | no |
7 | no | factor = 13 | no | no | no | no |
8 | no | no | no | no | no | no |
9 | factor = 10 | no | no | no | no | no |
10 | factor = 9 | no | no | no | no | no |