S3 vectors

This vignette shows you how to create your own S3 vector classes. It focuses on the aspects of making a vector class that every class needs to worry about; you’ll also need to provide methods that actually make the vector useful.

I assume that you’re already familiar with the basic machinery of S3, and the vocabulary I use in Advanced R: constructor, helper, and validator. If not, I recommend reading at least the first two sections of the S3 chapter of Advanced R.

This article refers to “vectors of numbers” as double vectors. Here, “double” stands for “double precision floating point number”, see also double().

library(vctrs)
library(zeallot)

This vignette works through five big topics:

They’re collectively demonstrated with a number of simple S3 classes:

Basics

In this section you’ll learn how to create a new vctrs class by calling new_vctr(). This creates an object with class vctrs_vctr which has a number of methods. These are designed to make your life as easy as possible. For example:

Percent class

In this section, I’ll show you how to make a percent class, i.e., a double vector that is printed as a percentage. We start by defining a low-level constructor that uses vec_assert() to checks types and/or sizes then calls new_vctr().

percent is built on a double vector of any length and doesn’t have any attributes.

new_percent <- function(x = double()) {
  vec_assert(x, double())
  new_vctr(x, class = "vctrs_percent")
}

x <- new_percent(c(seq(0, 1, length.out = 4), NA))
x
#> <vctrs_percent[5]>
#> [1] 0.0000000 0.3333333 0.6666667 1.0000000        NA

str(x)
#>  vctrs_pr [1:5] 0.0000000, 0.3333333, 0.6666667, 1.0000000,        NA

Note that we prefix the name of the class with the name of the package. This prevents conflicting definitions between packages. For packages that implement only one class (such as blob), it’s fine to use the package name without prefix as the class name.

We then follow up with a user friendly helper. Here we’ll use vec_cast() to allow it to accept anything coercible to a double:

percent <- function(x = double()) {
  x <- vec_cast(x, double())
  new_percent(x)
}

Before you go on, check that user-friendly constructor returns a zero-length vector when called with no arguments. This makes it easy to use as a prototype.

new_percent()
#> <vctrs_percent[0]>
percent()
#> <vctrs_percent[0]>

For the convenience of your users, consider implementing an is_percent() function:

is_percent <- function(x) {
  inherits(x, "vctrs_percent")
}

format() method

The first method for every class should almost always be a format() method. This should return a character vector the same length as x. The easiest way to do this is to rely on one of R’s low-level formatting functions like formatC():

format.vctrs_percent <- function(x, ...) {
  out <- formatC(signif(vec_data(x) * 100, 3))
  out[is.na(x)] <- NA
  out[!is.na(x)] <- paste0(out[!is.na(x)], "%")
  out
}
x
#> <vctrs_percent[5]>
#> [1] 0%    33.3% 66.7% 100%  <NA>

(Note the use of vec_data() so format() doesn’t get stuck in an infinite loop, and that I take a little care to not convert NA to "NA"; this leads to better printing.)

The format method is also used by data frames, tibbles, and str():

data.frame(x)
#>       x
#> 1    0%
#> 2 33.3%
#> 3 66.7%
#> 4  100%
#> 5  <NA>

For optimal display, I recommend also defining an abbreviated type name, which should be 4-5 letters for commonly used vectors. This is used in tibbles and in str():

vec_ptype_abbr.vctrs_percent <- function(x, ...) {
  "prcnt"
}

tibble::tibble(x)
#> # A tibble: 5 × 1
#>         x
#>   <prcnt>
#> 1      0%
#> 2   33.3%
#> 3   66.7%
#> 4    100%
#> 5      NA

str(x)
#>  prcnt [1:5] 0%, 33.3%, 66.7%, 100%, <NA>

If you need more control over printing in tibbles, implement a method for pillar::pillar_shaft(). See vignette("pillar", package = "vctrs") for details.

Casting and coercion

The next set of methods you are likely to need are those related to coercion and casting. Coercion and casting are two sides of the same coin: changing the prototype of an existing object. When the change happens implicitly (e.g in c()) we call it coercion; when the change happens explicitly (e.g. with as.integer(x)), we call it casting.

One of the main goals of vctrs is to put coercion and casting on a robust theoretical footing so it’s possible to make accurate predictions about what (e.g.) c(x, y) should do when x and y have different prototypes. vctrs achieves this goal through two generics:

Double dispatch

Both generics use double dispatch which means that the implementation is selected based on the class of two arguments, not just one. S3 does not natively support double dispatch, so we implement our own dispatch mechanism. In practice, this means:

Percent class

We’ll make our percent class coercible back and forth with double vectors.

vec_ptype2() provides a user friendly error message if the coercion doesn’t exist and makes sure NA is handled in a standard way. NA is technically a logical vector, but we want to stand in for a missing value of any type.

vec_ptype2("bogus", percent())
#> Error:
#> ! Can't combine `"bogus"` <character> and `percent()` <vctrs_percent>.
vec_ptype2(percent(), NA)
#> <vctrs_percent[0]>
vec_ptype2(NA, percent())
#> <vctrs_percent[0]>

By default and in simple cases, an object of the same class is compatible with itself:

vec_ptype2(percent(), percent())
#> <vctrs_percent[0]>

However this only works if the attributes for both objects are the same. Also the default methods are a bit slower. It is always a good idea to provide an explicit coercion method for the case of identical classes. So we’ll start by saying that a vctrs_percent combined with a vctrs_percent yields a vctrs_percent, which we indicate by returning a prototype generated by the constructor.

vec_ptype2.vctrs_percent.vctrs_percent <- function(x, y, ...) new_percent()

Next we define methods that say that combining a percent and double should yield a double. We avoid returning a percent here because errors in the scale (1 vs. 0.01) are more obvious with raw numbers.

Because double dispatch is a bit of a hack, we need to provide two methods. It’s your responsibility to ensure that each member of the pair returns the same result: if they don’t you will get weird and unpredictable behaviour.

The double dispatch mechanism requires us to refer to the underlying type, double, in the method name. If we implemented vec_ptype2.vctrs_percent.numeric(), it would never be called.

vec_ptype2.vctrs_percent.double <- function(x, y, ...) double()
vec_ptype2.double.vctrs_percent <- function(x, y, ...) double()

We can check that we’ve implemented this correctly with vec_ptype_show():

vec_ptype_show(percent(), double(), percent())
#> Prototype: <double>
#> 0. (                 , <vctrs_percent> ) = <vctrs_percent>
#> 1. ( <vctrs_percent> , <double>        ) = <double>       
#> 2. ( <double>        , <vctrs_percent> ) = <double>

The vec_ptype2() methods define which input is the richer type that vctrs should coerce to. However, they don’t perform any conversion. This is the job of vec_cast(), which we implement next. We’ll provide a method to cast a percent to a percent:

vec_cast.vctrs_percent.vctrs_percent <- function(x, to, ...) x

And then for converting back and forth between doubles. To convert a double to a percent we use the percent() helper (not the constructor; this is unvalidated user input). To convert a percent to a double, we strip the attributes.

Note that for historical reasons the order of argument in the signature is the opposite as for vec_ptype2(). The class for to comes first, and the class for x comes second.

Again, the double dispatch mechanism requires us to refer to the underlying type, double, in the method name. Implementing vec_cast.vctrs_percent.numeric() has no effect.

vec_cast.vctrs_percent.double <- function(x, to, ...) percent(x)
vec_cast.double.vctrs_percent <- function(x, to, ...) vec_data(x)

Then we can check this works with vec_cast():

vec_cast(0.5, percent())
#> <vctrs_percent[1]>
#> [1] 50%
vec_cast(percent(0.5), double())
#> [1] 0.5

Once you’ve implemented vec_ptype2() and vec_cast(), you get vec_c(), [<-, and [[<- implementations for free.

vec_c(percent(0.5), 1)
#> [1] 0.5 1.0
vec_c(NA, percent(0.5))
#> <vctrs_percent[2]>
#> [1] <NA> 50%
# but
vec_c(TRUE, percent(0.5))
#> Error:
#> ! Can't combine `..1` <logical> and `..2` <vctrs_percent>.

x <- percent(c(0.5, 1, 2))
x[1:2] <- 2:1
#> Error in `vec_restore_dispatch()`:
#> ! Can't convert <integer> to <vctrs_percent>.
x[[3]] <- 0.5
x
#> <vctrs_percent[3]>
#> [1] 50%  100% 50%

You’ll also get mostly correct behaviour for c(). The exception is when you use c() with a base R class:

# Correct
c(percent(0.5), 1)
#> [1] 0.5 1.0
c(percent(0.5), factor(1))
#> Error:
#> ! Can't combine `..1` <vctrs_percent> and `..2` <factor<25c7e>>.

# Incorrect
c(factor(1), percent(0.5))
#> [1] 1.0 0.5

Unfortunately there’s no way to fix this problem with the current design of c().

Again, as a convenience, consider providing an as_percent() function that makes use of the casts defined in your vec_cast.vctrs_percent() methods:

as_percent <- function(x) {
  vec_cast(x, new_percent())
}

Occasionally, it is useful to provide conversions that go beyond what’s allowed in casting. For example, we could offer a parsing method for character vectors. In this case, as_percent() should be generic, the default method should cast, and then additional methods should implement more flexible conversion:

as_percent <- function(x, ...) {
  UseMethod("as_percent")
}

as_percent.default <- function(x, ...) {
  vec_cast(x, new_percent())
}

as_percent.character <- function(x) {
  value <- as.numeric(gsub(" *% *$", "", x)) / 100
  new_percent(value)
}

Decimal class

Now that you’ve seen the basics with a very simple S3 class, we’ll gradually explore more complicated scenarios. This section creates a decimal class that prints with the specified number of decimal places. This is very similar to percent but now the class needs an attribute: the number of decimal places to display (an integer vector of length 1).

We start off as before, defining a low-level constructor, a user-friendly constructor, a format() method, and a vec_ptype_abbr(). Note that additional object attributes are simply passed along to new_vctr():

new_decimal <- function(x = double(), digits = 2L) {
  vec_assert(x, ptype = double())
  vec_assert(digits, ptype = integer(), size = 1)

  new_vctr(x, digits = digits, class = "vctrs_decimal")
}

decimal <- function(x = double(), digits = 2L) {
  x <- vec_cast(x, double())
  digits <- vec_recycle(vec_cast(digits, integer()), 1L)

  new_decimal(x, digits = digits)
}

digits <- function(x) attr(x, "digits")

format.vctrs_decimal <- function(x, ...) {
  sprintf(paste0("%-0.", digits(x), "f"), x)
}

vec_ptype_abbr.vctrs_decimal <- function(x, ...) {
  "dec"
}

x <- decimal(runif(10), 1L)
x
#> <vctrs_decimal[10]>
#>  [1] 0.1 0.8 0.6 0.2 0.0 0.5 0.5 0.3 0.7 0.8

Note that I provide a little helper to extract the digits attribute. This makes the code a little easier to read and should not be exported.

By default, vctrs assumes that attributes are independent of the data and so are automatically preserved. You’ll see what to do if the attributes are data dependent in the next section.

x[1:2]
#> <vctrs_decimal[2]>
#> [1] 0.1 0.8
x[[1]]
#> <vctrs_decimal[1]>
#> [1] 0.1

For the sake of exposition, we’ll assume that digits is an important attribute of the class and should be included in the full type:

vec_ptype_full.vctrs_decimal <- function(x, ...) {
  paste0("decimal<", digits(x), ">")
}

x
#> <decimal<1>[10]>
#>  [1] 0.1 0.8 0.6 0.2 0.0 0.5 0.5 0.3 0.7 0.8

Now consider vec_cast() and vec_ptype2(). Casting and coercing from one decimal to another requires a little thought as the values of the digits attribute might be different, and we need some way to reconcile them. Here I’ve decided to chose the maximum of the two; other reasonable options are to take the value from the left-hand side or throw an error.

vec_ptype2.vctrs_decimal.vctrs_decimal <- function(x, y, ...) {
  new_decimal(digits = max(digits(x), digits(y)))
}
vec_cast.vctrs_decimal.vctrs_decimal <- function(x, to, ...) {
  new_decimal(vec_data(x), digits = digits(to))
}

vec_c(decimal(1/100, digits = 3), decimal(2/100, digits = 2))
#> <decimal<3>[2]>
#> [1] 0.010 0.020

Finally, I can implement coercion to and from other types, like doubles. When automatically coercing, I choose the richer type (i.e., the decimal).

vec_ptype2.vctrs_decimal.double <- function(x, y, ...) x
vec_ptype2.double.vctrs_decimal <- function(x, y, ...) y

vec_cast.vctrs_decimal.double  <- function(x, to, ...) new_decimal(x, digits = digits(to))
vec_cast.double.vctrs_decimal  <- function(x, to, ...) vec_data(x)

vec_c(decimal(1, digits = 1), pi)
#> <decimal<1>[2]>
#> [1] 1.0 3.1
vec_c(pi, decimal(1, digits = 1))
#> <decimal<1>[2]>
#> [1] 3.1 1.0

If type x has greater resolution than y, there will be some inputs that lose precision. These should generate errors using stop_lossy_cast(). You can see that in action when casting from doubles to integers; only some doubles can become integers without losing resolution.

vec_cast(c(1, 2, 10), to = integer())
#> [1]  1  2 10

vec_cast(c(1.5, 2, 10.5), to = integer())
#> Error:
#> ! Can't convert from `c(1.5, 2, 10.5)` <double> to <integer> due to loss of precision.
#> • Locations: 1, 3

Cached sum class

The next level up in complexity is an object that has data-dependent attributes. To explore this idea we’ll create a vector that caches the sum of its values. As usual, we start with low-level and user-friendly constructors:

new_cached_sum <- function(x = double(), sum = 0L) {
  vec_assert(x, ptype = double())
  vec_assert(sum, ptype = double(), size = 1L)

  new_vctr(x, sum = sum, class = "vctrs_cached_sum")
}

cached_sum <- function(x) {
  x <- vec_cast(x, double())
  new_cached_sum(x, sum(x))
}

For this class, we can use the default format() method, and instead, we’ll customise the obj_print_footer() method. This is a good place to display user facing attributes.

obj_print_footer.vctrs_cached_sum <- function(x, ...) {
  cat("# Sum: ", format(attr(x, "sum"), digits = 3), "\n", sep = "")
}

x <- cached_sum(runif(10))
x
#> <vctrs_cached_sum[10]>
#>  [1] 0.87460066 0.17494063 0.03424133 0.32038573 0.40232824 0.19566983
#>  [7] 0.40353812 0.06366146 0.38870131 0.97554784
#> # Sum: 3.83

We’ll also override sum() and mean() to use the attribute. This is easiest to do with vec_math(), which you’ll learn about later.

vec_math.vctrs_cached_sum <- function(.fn, .x, ...) {
  cat("Using cache\n")
  switch(.fn,
    sum = attr(.x, "sum"),
    mean = attr(.x, "sum") / length(.x),
    vec_math_base(.fn, .x, ...)
  )
}

sum(x)
#> Using cache
#> [1] 3.833615

As mentioned above, vctrs assumes that attributes are independent of the data. This means that when we take advantage of the default methods, they’ll work, but return the incorrect result:

x[1:2]
#> <vctrs_cached_sum[2]>
#> [1] 0.8746007 0.1749406
#> # Sum: 3.83

To fix this, you need to provide a vec_restore() method. Note that this method dispatches on the to argument.

vec_restore.vctrs_cached_sum <- function(x, to, ..., i = NULL) {
  new_cached_sum(x, sum(x))
}

x[1]
#> <vctrs_cached_sum[1]>
#> [1] 0.8746007
#> # Sum: 0.875

This works because most of the vctrs methods dispatch to the underlying base function by first stripping off extra attributes with vec_data() and then reapplying them again with vec_restore(). The default vec_restore() method copies over all attributes, which is not appropriate when the attributes depend on the data.

Note that vec_restore.class is subtly different from vec_cast.class.class(). vec_restore() is used when restoring attributes that have been lost; vec_cast() is used for coercions. This is easier to understand with a concrete example. Imagine factors were implemented with new_vctr(). vec_restore.factor() would restore attributes back to an integer vector, but you would not want to allow manually casting an integer to a factor with vec_cast().

Record-style objects

Record-style objects use a list of equal-length vectors to represent individual components of the object. The best example of this is POSIXlt, which underneath the hood is a list of 11 fields like year, month, and day. Record-style classes override length() and subsetting methods to conceal this implementation detail.

x <- as.POSIXlt(ISOdatetime(2020, 1, 1, 0, 0, 1:3))
x
#> [1] "2020-01-01 00:00:01 CET" "2020-01-01 00:00:02 CET"
#> [3] "2020-01-01 00:00:03 CET"

length(x)
#> [1] 3
length(unclass(x))
#> [1] 11

x[[1]] # the first date time
#> [1] "2020-01-01 00:00:01 CET"
unclass(x)[[1]] # the first component, the number of seconds
#> [1] 1 2 3

vctrs makes it easy to create new record-style classes using new_rcrd(), which has a wide selection of default methods.

Rational class

A fraction, or rational number, can be represented by a pair of integer vectors representing the numerator (the number on top) and the denominator (the number on bottom), where the length of each vector must be the same. To represent such a data structure we turn to a new base data type: the record (or rcrd for short).

As usual we start with low-level and user-friendly constructors. The low-level constructor calls new_rcrd(), which needs a named list of equal-length vectors.

new_rational <- function(n = integer(), d = integer()) {
  vec_assert(n, ptype = integer())
  vec_assert(d, ptype = integer())

  new_rcrd(list(n = n, d = d), class = "vctrs_rational")
}

Our user friendly constructor casts n and d to integers and recycles them to the same length.

rational <- function(n = integer(), d = integer()) {
  c(n, d) %<-% vec_cast_common(n, d, .to = integer())
  c(n, d) %<-% vec_recycle_common(n, d)

  new_rational(n, d)
}

x <- rational(1, 1:10)

Behind the scenes, x is a named list with two elements. But those details are hidden so that it behaves like a vector:

names(x)
#> NULL
length(x)
#> [1] 10

To access the underlying fields we need to use field() and fields():

fields(x)
#> [1] "n" "d"
field(x, "n")
#>  [1] 1 1 1 1 1 1 1 1 1 1

Notice that we can’t print() or str() the new rational vector x yet. Printing causes an error:

x
#> <vctrs_rational[10]>
#> Error in `format()` at vctrs/R/print-str.R:44:2:
#> ! `format.vctrs_rational()` not implemented.

str(x)
#> Error in `format()` at vctrs/R/print-str.R:129:2:
#> ! `format.vctrs_rational()` not implemented.

This is because we haven’t defined how our class can be printed from the underlying data. Note that if you want to look under the hood during development, you can always call vec_data(x).

vec_data(x)
#>    n  d
#> 1  1  1
#> 2  1  2
#> 3  1  3
#> 4  1  4
#> 5  1  5
#> 6  1  6
#> 7  1  7
#> 8  1  8
#> 9  1  9
#> 10 1 10

str(vec_data(x))
#> 'data.frame':    10 obs. of  2 variables:
#>  $ n: int  1 1 1 1 1 1 1 1 1 1
#>  $ d: int  1 2 3 4 5 6 7 8 9 10

It is generally best to define a formatting method early in the development of a class. The format method defines how to display the class so that it can be printed in the normal way:

format.vctrs_rational <- function(x, ...) {
  n <- field(x, "n")
  d <- field(x, "d")

  out <- paste0(n, "/", d)
  out[is.na(n) | is.na(d)] <- NA

  out
}

vec_ptype_abbr.vctrs_rational <- function(x, ...) "rtnl"
vec_ptype_full.vctrs_rational <- function(x, ...) "rational"

x
#> <rational[10]>
#>  [1] 1/1  1/2  1/3  1/4  1/5  1/6  1/7  1/8  1/9  1/10

vctrs uses the format() method in str(), hiding the underlying implementation details from the user:

str(x)
#>  rtnl [1:10] 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, 1/8, 1/9, 1/10

For rational, vec_ptype2() and vec_cast() follow the same pattern as percent(). We allow coercion from integer and to doubles.

vec_ptype2.vctrs_rational.vctrs_rational <- function(x, y, ...) new_rational()
vec_ptype2.vctrs_rational.integer <- function(x, y, ...) new_rational()
vec_ptype2.integer.vctrs_rational <- function(x, y, ...) new_rational()

vec_cast.vctrs_rational.vctrs_rational <- function(x, to, ...) x
vec_cast.double.vctrs_rational <- function(x, to, ...) field(x, "n") / field(x, "d")
vec_cast.vctrs_rational.integer <- function(x, to, ...) rational(x, 1)

vec_c(rational(1, 2), 1L, NA)
#> <rational[3]>
#> [1] 1/2  1/1  <NA>

Decimal2 class

The previous implementation of decimal was built on top of doubles. This is a bad idea because decimal vectors are typically used when you care about precise values (i.e., dollars and cents in a bank account), and double values suffer from floating point problems.

A better implementation of a decimal class would be to use pair of integers, one for the value to the left of the decimal point, and the other for the value to the right (divided by a scale). The following code is a very quick sketch of how you might start creating such a class:

new_decimal2 <- function(l, r, scale = 2L) {
  vec_assert(l, ptype = integer())
  vec_assert(r, ptype = integer())
  vec_assert(scale, ptype = integer(), size = 1L)

  new_rcrd(list(l = l, r = r), scale = scale, class = "vctrs_decimal2")
}

decimal2 <- function(l, r, scale = 2L) {
  l <- vec_cast(l, integer())
  r <- vec_cast(r, integer())
  c(l, r) %<-% vec_recycle_common(l, r)
  scale <- vec_cast(scale, integer())

  # should check that r < 10^scale
  new_decimal2(l = l, r = r, scale = scale)
}

format.vctrs_decimal2 <- function(x, ...) {
  val <- field(x, "l") + field(x, "r") / 10^attr(x, "scale")
  sprintf(paste0("%.0", attr(x, "scale"), "f"), val)
}

decimal2(10, c(0, 5, 99))
#> <vctrs_decimal2[3]>
#> [1] 10.00 10.05 10.99

Equality and comparison

vctrs provides four “proxy” generics. Two of these let you control how your class determines equality and comparison:

Two other proxy generic are used for sorting for unordered data types and for accessing the raw data for exotic storage formats:

The default behavior is as follows:

You should only implement these proxies when some preprocessing on the data is needed to make elements comparable. In that case, defining these methods will get you a lot of behaviour for relatively little work.

These proxy functions should always return a simple object (either a bare vector or a data frame) that possesses the same properties as your class. This permits efficient implementation of the vctrs internals because it allows dispatch to happen once in R, and then efficient computations can be written in C.

Rational class

Let’s explore these ideas by with the rational class we started on above. By default, vec_proxy() converts a record to a data frame, and the default comparison works column by column:

x <- rational(c(1, 2, 1, 2), c(1, 1, 2, 2))
x
#> <rational[4]>
#> [1] 1/1 2/1 1/2 2/2

vec_proxy(x)
#>   n d
#> 1 1 1
#> 2 2 1
#> 3 1 2
#> 4 2 2

x == rational(1, 1)
#> [1]  TRUE FALSE FALSE FALSE

This makes sense as a default but isn’t correct here because rational(1, 1) represents the same number as rational(2, 2), so they should be equal. We can fix that by implementing a vec_proxy_equal() method that divides n and d by their greatest common divisor:

# Thanks to Matthew Lundberg: https://stackoverflow.com/a/21504113/16632
gcd <- function(x, y) {
  r <- x %% y
  ifelse(r, gcd(y, r), y)
}

vec_proxy_equal.vctrs_rational <- function(x, ...) {
  n <- field(x, "n")
  d <- field(x, "d")
  gcd <- gcd(n, d)

  data.frame(n = n / gcd, d = d / gcd)
}
vec_proxy_equal(x)
#>   n d
#> 1 1 1
#> 2 2 1
#> 3 1 2
#> 4 1 1

x == rational(1, 1)
#> [1]  TRUE FALSE FALSE  TRUE

vec_proxy_equal() is also used by unique():

unique(x)
#> <rational[3]>
#> [1] 1/1 2/1 1/2

We now need to fix the comparison operations similarly, since comparison currently happens lexicographically by n, then by d:

rational(1, 2) < rational(2, 3)
#> [1] TRUE
rational(2, 4) < rational(2, 3)
#> [1] TRUE

The easiest fix is to convert the fraction to a floating point number and use this as a proxy:

vec_proxy_compare.vctrs_rational <- function(x, ...) {
  field(x, "n") / field(x, "d")
}

rational(2, 4) < rational(2, 3)
#> [1] TRUE

This also fixes sort(), because the default implementation of vec_proxy_order() calls vec_proxy_compare().

sort(x)
#> <rational[4]>
#> [1] 1/2 1/1 2/2 2/1

(We could have used the same approach in vec_proxy_equal(), but when working with floating point numbers it’s not necessarily true that x == y implies that d * x == d * y.)

Polynomial class

A related problem occurs if we build our vector on top of a list. The following code defines a polynomial class that represents polynomials (like 1 + 3x - 2x^2) using a list of integer vectors (like c(1, 3, -2)). Note the use of new_list_of() in the constructor.

new_poly <- function(x) {
  new_list_of(x, ptype = integer(), class = "vctrs_poly")
}

poly <- function(...) {
  x <- list(...)
  x <- lapply(x, vec_cast, integer())
  new_poly(x)
}

vec_ptype_full.vctrs_poly <- function(x, ...) "polynomial"
vec_ptype_abbr.vctrs_poly <- function(x, ...) "poly"

format.vctrs_poly <- function(x, ...) {
  format_one <- function(x) {
    if (length(x) == 0) {
      return("")
    } else if (length(x) == 1) {
      format(x)
    } else {
      suffix <- c(paste0("\u22C5x^", seq(length(x) - 1, 1)), "")
      out <- paste0(x, suffix)
      out <- out[x != 0L]
      paste0(out, collapse = " + ")
    }
  }
  vapply(x, format_one, character(1))
}

obj_print_data.vctrs_poly <- function(x, ...) {
  if (length(x) == 0)
    return()
  print(format(x), quote = FALSE)
}

p <- poly(1, c(1, 0, 0, 0, 2), c(1, 0, 1))
p
#> <polynomial[3]>
#> [1] 1         1⋅x^4 + 2 1⋅x^2 + 1

The resulting objects will inherit from the vctrs_list_of class, which provides tailored methods for $, [[, the corresponding assignment operators, and other methods.

class(p)
#> [1] "vctrs_poly"    "vctrs_list_of" "vctrs_vctr"    "list"
p[2]
#> <polynomial[1]>
#> [1] 1⋅x^4 + 2
p[[2]]
#> [1] 1 0 0 0 2

Equality works out of the box because we can tell if two integer vectors are equal:

p == poly(c(1, 0, 1))
#> [1] FALSE FALSE  TRUE

We can’t compare individual elements, because by default lists are not comparable:

p < p[2]
#> Error in `vec_proxy_compare()`:
#> ! `vec_proxy_compare.vctrs_poly()` not supported.

To enable comparison, we implement a vec_proxy_compare() method:

vec_proxy_compare.vctrs_poly <- function(x, ...) {
  x_raw <- vec_data(x)
  # First figure out the maximum length
  n <- max(vapply(x_raw, length, integer(1)))

  # Then expand all vectors to this length by filling in with zeros
  full <- lapply(x_raw, function(x) c(rep(0L, n - length(x)), x))

  # Then turn into a data frame
  as.data.frame(do.call(rbind, full))
}

p < p[2]
#> [1]  TRUE FALSE  TRUE

Often, this is sufficient to also implement sort(). However, for lists, there is already a default vec_proxy_order() method that sorts by first occurrence:

sort(p)
#> <polynomial[3]>
#> [1] 1         1⋅x^4 + 2 1⋅x^2 + 1
sort(p[c(1:3, 1:2)])
#> <polynomial[5]>
#> [1] 1         1         1⋅x^4 + 2 1⋅x^4 + 2 1⋅x^2 + 1

To ensure consistency between ordering and comparison, we forward vec_proxy_order() to vec_proxy_compare():

vec_proxy_order.vctrs_poly <- function(x, ...) {
  vec_proxy_compare(x, ...)
}

sort(p)
#> <polynomial[3]>
#> [1] 1         1⋅x^2 + 1 1⋅x^4 + 2

Arithmetic

vctrs also provides two mathematical generics that allow you to define a broad swath of mathematical behaviour at once:

Both generics define the behaviour for multiple functions because sum.vctrs_vctr(x) calls vec_math.vctrs_vctr("sum", x), and x + y calls vec_math.x_class.y_class("+", x, y). They’re accompanied by vec_math_base() and vec_arith_base() which make it easy to call the underlying base R functions.

vec_arith() uses double dispatch and needs the following standard boilerplate:

vec_arith.MYCLASS <- function(op, x, y, ...) {
  UseMethod("vec_arith.MYCLASS", y)
}
vec_arith.MYCLASS.default <- function(op, x, y, ...) {
  stop_incompatible_op(op, x, y)
}

Cached sum class

I showed an example of vec_math() to define sum() and mean() methods for cached_sum. Now let’s talk about exactly how it works. Most vec_math() functions will have a similar form. You use a switch statement to handle the methods that you care about and fall back to vec_math_base() for those that you don’t care about.

vec_math.vctrs_cached_sum <- function(.fn, .x, ...) {
  switch(.fn,
    sum = attr(.x, "sum"),
    mean = attr(.x, "sum") / length(.x),
    vec_math_base(.fn, .x, ...)
  )
}

Meter class

To explore the infix arithmetic operators exposed by vec_arith() I’ll create a new class that represents a measurement in meters:

new_meter <- function(x) {
  stopifnot(is.double(x))
  new_vctr(x, class = "vctrs_meter")
}

format.vctrs_meter <- function(x, ...) {
  paste0(format(vec_data(x)), " m")
}

meter <- function(x) {
  x <- vec_cast(x, double())
  new_meter(x)
}

x <- meter(1:10)
x
#> <vctrs_meter[10]>
#>  [1]  1 m  2 m  3 m  4 m  5 m  6 m  7 m  8 m  9 m 10 m

Because meter is built on top of a double vector, basic mathematic operations work:

sum(x)
#> <vctrs_meter[1]>
#> [1] 55 m
mean(x)
#> <vctrs_meter[1]>
#> [1] 5.5 m

But we can’t do arithmetic:

x + 1
#> Error in `vec_arith()` at vctrs/R/type-vctr.R:656:4:
#> ! <vctrs_meter> + <double> is not permitted
meter(10) + meter(1)
#> Error in `vec_arith()` at vctrs/R/type-vctr.R:656:4:
#> ! <vctrs_meter> + <vctrs_meter> is not permitted
meter(10) * 3
#> Error in `vec_arith()` at vctrs/R/type-vctr.R:671:2:
#> ! <vctrs_meter> * <double> is not permitted

To allow these infix functions to work, we’ll need to provide vec_arith() generic. But before we do that, let’s think about what combinations of inputs we should support:

vec_arith() is another function that uses double dispatch, so as usual we start with a template.

vec_arith.vctrs_meter <- function(op, x, y, ...) {
  UseMethod("vec_arith.vctrs_meter", y)
}
vec_arith.vctrs_meter.default <- function(op, x, y, ...) {
  stop_incompatible_op(op, x, y)
}

Then write the method for two meter objects. We use a switch statement to cover the cases we care about and stop_incompatible_op() to throw an informative error message for everything else.

vec_arith.vctrs_meter.vctrs_meter <- function(op, x, y, ...) {
  switch(
    op,
    "+" = ,
    "-" = new_meter(vec_arith_base(op, x, y)),
    "/" = vec_arith_base(op, x, y),
    stop_incompatible_op(op, x, y)
  )
}

meter(10) + meter(1)
#> <vctrs_meter[1]>
#> [1] 11 m
meter(10) - meter(1)
#> <vctrs_meter[1]>
#> [1] 9 m
meter(10) / meter(1)
#> [1] 10
meter(10) * meter(1)
#> Error in `vec_arith()` at vctrs/R/type-vctr.R:671:2:
#> ! <vctrs_meter> * <vctrs_meter> is not permitted

Next we write the pair of methods for arithmetic with a meter and a number. These are almost identical, but while meter(10) / 2 makes sense, 2 / meter(10) does not (and neither do addition and subtraction). To support both doubles and integers as operands, we dispatch over numeric here instead of double.

vec_arith.vctrs_meter.numeric <- function(op, x, y, ...) {
  switch(
    op,
    "/" = ,
    "*" = new_meter(vec_arith_base(op, x, y)),
    stop_incompatible_op(op, x, y)
  )
}
vec_arith.numeric.vctrs_meter <- function(op, x, y, ...) {
  switch(
    op,
    "*" = new_meter(vec_arith_base(op, x, y)),
    stop_incompatible_op(op, x, y)
  )
}

meter(2) * 10
#> <vctrs_meter[1]>
#> [1] 20 m
meter(2) * as.integer(10)
#> <vctrs_meter[1]>
#> [1] 20 m
10 * meter(2)
#> <vctrs_meter[1]>
#> [1] 20 m
meter(20) / 10
#> <vctrs_meter[1]>
#> [1] 2 m
10 / meter(20)
#> Error in `vec_arith()` at vctrs/R/type-vctr.R:676:2:
#> ! <double> / <vctrs_meter> is not permitted
meter(20) + 10
#> Error in `vec_arith()` at vctrs/R/type-vctr.R:656:4:
#> ! <vctrs_meter> + <double> is not permitted

For completeness, we also need vec_arith.vctrs_meter.MISSING for the unary + and - operators:

vec_arith.vctrs_meter.MISSING <- function(op, x, y, ...) {
  switch(op,
    `-` = x * -1,
    `+` = x,
    stop_incompatible_op(op, x, y)
  )
}
-meter(1)
#> <vctrs_meter[1]>
#> [1] -1 m
+meter(1)
#> <vctrs_meter[1]>
#> [1] 1 m

Implementing a vctrs S3 class in a package

Defining S3 methods interactively is fine for iteration and exploration, but if your class lives in a package, you need to do a few more things:

Let’s assume that the percent class is implemented in the pizza package in the file R/percent.R. Here we walk through the major sections of this hypothetical file. You’ve seen all of this code before, but now it’s augmented by the roxygen2 directives that produce the correct NAMESPACE entries and help topics.

Getting started

First, the pizza package needs to include vctrs in the Imports section of its DESCRIPTION (perhaps by calling usethis::use_package("vctrs"). While vctrs is under very active development, it probably makes sense to state a minimum version.

Imports:
    a_package,
    another_package,
    ...
    vctrs (>= x.y.z),
    ...

Then we make all vctrs functions available within the pizza package by including the directive #' @import vctrs somewhere. Usually, it’s not good practice to @import the entire namespace of a package, but vctrs is deliberately designed with this use case in mind.

Where should we put #' @import vctrs? There are two natural locations:

We also must use one of these locations to dump some internal documentation that’s needed to avoid R CMD check complaints. We don’t expect any human to ever read this documentation. Here’s how this dummy documentation should look, combined with the #' @import vctrs directive described above.

#' Internal vctrs methods
#'
#' @import vctrs
#' @keywords internal
#' @name pizza-vctrs
NULL

This should appear in R/pizza-doc.R (package-level docs) or in R/percent.R (class-focused file).

Remember to call devtools::document() regularly, as you develop, to regenerate NAMESPACE and the .Rd files.

From this point on, the code shown is expected to appear in R/percent.R.

Low-level and user-friendly constructors

Next we add our constructor:

new_percent <- function(x = double()) {
  vec_assert(x, double())
  new_vctr(x, class = "pizza_percent")
}

Note that the name of the package must be included in the class name (pizza_percent), but it does not need to be included in the constructor name. You do not need to export the constructor, unless you want people to extend your class.

We can also add a call to setOldClass() for compatibility with S4:

# for compatibility with the S4 system
methods::setOldClass(c("pizza_percent", "vctrs_vctr"))

Because we’ve used a function from the methods package, you’ll also need to add methods to Imports, with (e.g.) usethis::use_package("methods"). This is a “free” dependency because methods is bundled with every R install.

Next we implement, export, and document a user-friendly helper: percent().

#' `percent` vector
#'
#' This creates a double vector that represents percentages so when it is
#' printed, it is multiplied by 100 and suffixed with `%`.
#'
#' @param x A numeric vector
#' @return An S3 vector of class `pizza_percent`.
#' @export
#' @examples
#' percent(c(0.25, 0.5, 0.75))
percent <- function(x = double()) {
  x <- vec_cast(x, double())
  new_percent(x)
}

(Again note that the package name will appear in the class, but does not need to occur in the function, because we can already do pizza::percent(); it would be redundant to have pizza::pizza_percent().)

Other helpers

It’s a good idea to provide a function that tests if an object is of this class. If you do so, it makes sense to document it with the user-friendly constructor percent():

#' @export
#' @rdname percent
is_percent <- function(x) {
  inherits(x, "pizza_percent")
}

You’ll also need to update the percent() documentation to reflect that x now means two different things:

#' @param x
#'  * For `percent()`: A numeric vector
#'  * For `is_percent()`: An object to test.

Next we provide the key methods to make printing work. These are S3 methods, so they don’t need to be documented, but they do need to be exported.

#' @export
format.pizza_percent <- function(x, ...) {
  out <- formatC(signif(vec_data(x) * 100, 3))
  out[is.na(x)] <- NA
  out[!is.na(x)] <- paste0(out[!is.na(x)], "%")
  out
}

#' @export
vec_ptype_abbr.pizza_percent <- function(x, ...) {
  "prcnt"
}

Finally, we implement methods for vec_ptype2() and vec_cast().

#' @export
vec_ptype2.vctrs_percent.vctrs_percent <- function(x, y, ...) new_percent()
#' @export
vec_ptype2.double.vctrs_percent <- function(x, y, ...) double()

#' @export
vec_cast.pizza_percent.pizza_percent <- function(x, to, ...) x
#' @export
vec_cast.pizza_percent.double <- function(x, to, ...) percent(x)
#' @export
vec_cast.double.pizza_percent <- function(x, to, ...) vec_data(x)

Testing

It’s good practice to test your new class. Specific recommendations:

If your tests pass when run by devtools::test(), but fail when run in R CMD check, it is very likely to reflect a problem with S3 method registration. Carefully check your roxygen2 comments and the generated NAMESPACE.

Existing classes

Before you build your own class, you might want to consider using, or subclassing existing classes. You can check awesome-vctrs for a curated list of R vector classes, some of which are built with vctrs.

If you’ve built or extended a class, consider adding it to that list so other people can use it.