valaddin is a lightweight R package that enables you to transform an existing
function into a function with input validation checks. It does so without
requiring you to modify the body of the function, in contrast to doing input
validation using stop
or stopifnot
, and is therefore suitable for both
programmatic and interactive use.
This document illustrates the use of valaddin, by example. For usage details,
see the main documentation page, ?firmly
.
The workhorse of valaddin is the function firmly
, which applies input
validation to a function, in situ. It can be used to:
For example, to require that all arguments of the function
f <- function(x, h) (sin(x + h) - sin(x)) / h
are numerical, apply firmly
with the check formula ~is.numeric
[1]:
ff <- firmly(f, ~is.numeric)
ff
behaves just like f
, but with a constraint on the type of its arguments:
ff(0.0, 0.1)
#> [1] 0.9983342
ff("0.0", 0.1)
#> Error: ff(x = "0.0", h = 0.1)
#> FALSE: is.numeric(x)
[1]: The inspiration to use ~
as a quoting operator came from the vignette
Non-standard
evaluation,
by Hadley Wickham.
For example, use firmly
to put a cap on potentially long-running computations:
fib <- function(n) {
if (n <= 1L) return(1L)
Recall(n - 1L) + Recall(n - 2L)
}
capped_fib <- firmly(fib, list("n capped at 30" ~ ceiling(n)) ~ {. <= 30L})
capped_fib(10)
#> [1] 89
capped_fib(50)
#> Error: capped_fib(n = 50)
#> n capped at 30
The role of each part of the value-constraining formula is evident:
The right-hand side {. <= 30L}
is the constraint itself, expressed as a
condition on .
, a placeholder argument.
The left-hand side list("n capped at 30" ~ ceiling(n))
specifies the
expression for the placeholder, namely ceiling(n)
, along with a message to
produce if the constraint is violated.
If the default behavior of a function is problematic, or unexpected, you can use
firmly
to warn you. Consider the function as.POSIXct
, which creates a
date-time object:
Sys.setenv(TZ = "CET")
(d <- as.POSIXct("2017-01-01 09:30:00"))
#> [1] "2017-01-01 09:30:00 CET"
The problem is that d
is a potentially ambiguous object (with hidden state),
because it's not assigned a time zone, explicitly. If you compute the local hour
of d
using as.POSIXlt
, you get an answer that interprets d
according to
your current time zone; another user—or you, in another country, in the
future—may get a different result.
If you're in CET time zone:
as.POSIXlt(d, tz = "EST")$hour
#> [1] 3
If you were to change to EST time zone and rerun the code:
Sys.setenv(TZ = "EST")
d <- as.POSIXct("2017-01-01 09:30:00")
as.POSIXlt(d, tz = "EST")$hour
#> [1] 9
To warn yourself about this pitfall, you can modify as.POSIXct
to complain
when you've forgotten to specify a time zone:
as.POSIXct <- firmly(as.POSIXct, .warn_missing = "tz")
Now when you call as.POSIXct
, you get a cautionary reminder:
as.POSIXct("2017-01-01 09:30:00")
#> Warning: Argument(s) expected but not specified in call as.POSIXct(x =
#> "2017-01-01 09:30:00"): `tz`
#> [1] "2017-01-01 09:30:00 CET"
as.POSIXct("2017-01-01 09:30:00", tz = "CET")
#> [1] "2017-01-01 09:30:00 CET"
NB: The missing-argument warning is implemented by wrapping functions. The
underlying function base::as.POSIXct
is called unmodified.
loosely
to access the original functionThough reassigning as.POSIXct
may seem risky, it is not, for the behavior is
unchanged (aside from the extra precaution), and the original as.POSIXct
remains accessible:
base::as.POSIXct
loosely
to strip input validation: loosely(as.POSIXct)
loosely(as.POSIXct)("2017-01-01 09:30:00")
#> [1] "2017-01-01 09:30:00 CET"
identical(loosely(as.POSIXct), base::as.POSIXct)
#> [1] TRUE
R tries to help you express your ideas as concisely as possible. Suppose you
want to truncate negative values of a vector w
:
w <- {set.seed(1); rnorm(5)}
ifelse(w > 0, w, 0)
#> [1] 0.0000000 0.1836433 0.0000000 1.5952808 0.3295078
ifelse
assumes (correctly) that you intend the 0
to be repeated 5
times, and does that for you, automatically.
Nonetheless, R's good intentions have a darker side:
z <- rep(1, 6)
pos <- 1:5
neg <- -6:-1
ifelse(z > 0, pos, neg)
#> [1] 1 2 3 4 5 1
This smells like a coding error. Instead of complaining that pos
is too short,
ifelse
recycles it to line it up with z
. The result is probably not what you
wanted.
In this case, you don't need a helping hand, but rather a firm one:
chk_length_type <- list(
"'yes', 'no' differ in length" ~ length(yes) == length(no),
"'yes', 'no' differ in type" ~ typeof(yes) == typeof(no)
) ~ isTRUE
ifelse_f <- firmly(ifelse, chk_length_type)
ifelse_f
is more pedantic than ifelse
. But it also spares you the
consequences of invalid inputs:
ifelse_f(w > 0, w, 0)
#> Error: ifelse_f(test = w > 0, yes = w, no = 0)
#> 'yes', 'no' differ in length
ifelse_f(w > 0, w, rep(0, length(w)))
#> [1] 0.0000000 0.1836433 0.0000000 1.5952808 0.3295078
ifelse(z > 0, pos, neg)
#> [1] 1 2 3 4 5 1
ifelse_f(z > 0, pos, neg)
#> Error: ifelse_f(test = z > 0, yes = pos, no = neg)
#> 'yes', 'no' differ in length
ifelse(z > 0, as.character(pos), neg)
#> [1] "1" "2" "3" "4" "5" "1"
ifelse_f(z > 0, as.character(pos), neg)
#> Error: ifelse_f(test = z > 0, yes = as.character(pos), no = neg)
#> 1) 'yes', 'no' differ in length
#> 2) 'yes', 'no' differ in type
When R make a function call, say, f(a)
, the value of the argument a
is not
materialized in the body of f
until it is actually needed. Usually, you can
safely ignore this as a technicality of R's evaluation model; but in some
situations, it can be problematic if you're not mindful of it.
Consider a bank that waives fees for students. A function to make deposits might look like this[2]:
deposit <- function(account, value) {
if (is_student(account)) {
account$fees <- 0
}
account$balance <- account$balance + value
account
}
is_student <- function(account) {
if (isTRUE(account$is_student)) TRUE else FALSE
}
Suppose Bob is an account holder, currently not in school:
bobs_acct <- list(balance = 10, fees = 3, is_student = FALSE)
If Bob were to deposit an amount to cover an future fee payment, his account balance would be updated to:
deposit(bobs_acct, bobs_acct$fees)$balance
#> [1] 13
Bob goes back to school and informs the bank, so that his fees will be waived:
bobs_acct$is_student <- TRUE
But now suppose that, somewhere in the bowels of the bank's software, the type of Bob's account object is converted from a list to an environment:
bobs_acct <- list2env(bobs_acct)
If Bob were to deposit an amount to cover an future fee payment, his account balance would now be updated to:
deposit(bobs_acct, bobs_acct$fees)$balance
#> [1] 10
Becoming a student has cost Bob money. What happened to the amount deposited?
The culprit is lazy evaluation and the modify-in-place semantics of
environments. In the call deposit(account = bobs_acct, value = bobs_acct$fee)
,
the value of the argument value
is only set when it's used, which comes after
the object fee
in the environment bobs_acct
has already been zeroed out.
To minimize such risks, forbid account
from being an environment:
err_msg <- "`acccount` should not be an environment"
deposit <- firmly(deposit, list(err_msg ~ account) ~ Negate(is.environment))
This makes Bob a happy customer, and reduces the bank's liability:
bobs_acct <- list2env(list(balance = 10, fees = 3, is_student = TRUE))
deposit(bobs_acct, bobs_acct$fees)$balance
#> Error: deposit(account = bobs_acct, value = bobs_acct$fees)
#> `acccount` should not be an environment
deposit(as.list(bobs_acct), bobs_acct$fees)$balance
#> [1] 13
[2]: Adapted from an example in Section 6.3 of Chambers, Extending R, CRC Press, 2016. For the sake of the example, ignore the fact that logic to handle fees does not belong in a function for deposits!
You don't mean to shoot yourself, but sometimes it happens, nonetheless:
x <- "An expensive object"
save(x, file = "my-precious.rda")
x <- "Oops! A bug or lapse has tarnished your expensive object"
# Many computations later, you again save x, oblivious to the accident ...
save(x, file = "my-precious.rda")
firmly
can safeguard you from such mishaps: implement a safety procedure
# Argument `gear` is a list with components:
# fun: Function name
# ns : Namespace of `fun`
# chk: Formula that specify input checks
hardhat <- function(gear, env = .GlobalEnv) {
for (. in gear) {
safe_fun <- firmly(getFromNamespace(.$fun, .$ns), .$chk)
assign(.$fun, safe_fun, envir = env)
}
}
gather your safety gear
protection <- list(
list(
fun = "save",
ns = "base",
chk = list("Won't overwrite `file`" ~ file) ~ Negate(file.exists)
),
list(
fun = "load",
ns = "base",
chk = list("Won't load objects into current environment" ~ envir) ~
{!identical(., parent.frame(2))}
)
)
then put it on
hardhat(protection)
Now save
and load
engage safety features that prevent you from inadvertently
destroying your data:
x <- "An expensive object"
save(x, file = "my-precious.rda")
x <- "Oops! A bug or lapse has tarnished your expensive object"
#> Error: save(x, file = "my-precious.rda")
#> Won't overwrite `file`
save(x, file = "my-precious.rda")
# Inspecting x, you notice it's changed, so you try to retrieve the original ...
x
#> [1] "Oops! A bug or lapse has tarnished your expensive object"
load("my-precious.rda")
#> Error: load(file = "my-precious.rda")
#> Won't load objects into current environment
# Keep calm and carry on
loosely(load)("my-precious.rda")
x
#> [1] "An expensive object"
NB: Input validation is implemented by wrapping functions; thus, if the
arguments are valid, the underlying functions base::save
, base::load
are
called unmodified.
valaddin provides a collection of over 50 pre-made input checkers to
facilitate typical kinds of argument checks. These checkers are prefixed by
vld_
, for convenient browsing and look-up in editors and IDE's that support
name completion.
For example, to create a type-checked version of the function upper.tri
, which
returns an upper-triangular logical matrix, apply the checkers vld_matrix
,
vld_boolean
(here “boolean” is shorthand for “logical vector of length 1”):
upper_tri <- firmly(upper.tri, vld_matrix(~x), vld_boolean(~diag))
# upper.tri assumes you mean a vector to be a column matrix
upper.tri(1:2)
#> [,1]
#> [1,] FALSE
#> [2,] FALSE
upper_tri(1:2)
#> Error: upper_tri(x = 1:2, diag = FALSE)
#> Not matrix: x
# But say you actually meant (1, 2) to be a diagonal matrix
upper_tri(diag(1:2))
#> [,1] [,2]
#> [1,] FALSE TRUE
#> [2,] FALSE FALSE
upper_tri(diag(1:2), diag = "true")
#> Error: upper_tri(x = diag(1:2), diag = "true")
#> Not boolean: diag
upper_tri(diag(1:2), TRUE)
#> [,1] [,2]
#> [1,] TRUE TRUE
#> [2,] FALSE TRUE
vld_true
Any input validation can be expressed as an assertion that “such and such must
be true”; to apply it as such, use vld_true
(or its complement, vld_false
).
For example, the above hardening of ifelse
can be redone as:
chk_length_type <- vld_true(
"'yes', 'no' differ in length" ~ length(yes) == length(no),
"'yes', 'no' differ in type" ~ typeof(yes) == typeof(no)
)
ifelse_f <- firmly(ifelse, chk_length_type)
z <- rep(1, 6)
pos <- 1:5
neg <- -6:-1
ifelse_f(z > 0, as.character(pos), neg)
#> Error: ifelse_f(test = z > 0, yes = as.character(pos), no = neg)
#> 1) 'yes', 'no' differ in length
#> 2) 'yes', 'no' differ in type
ifelse_f(z > 0, c(pos, 6), neg)
#> Error: ifelse_f(test = z > 0, yes = c(pos, 6), no = neg)
#> 'yes', 'no' differ in type
ifelse_f(z > 0, c(pos, 6L), neg)
#> [1] 1 2 3 4 5 6
localize
A check formula such as ~ is.numeric
(or "Not number" ~ is.numeric
, if you
want a custom error message) imposes its condition “globally”:
difference <- firmly(function(x, y) x - y, ~ is.numeric)
difference(3, 1)
#> [1] 2
difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
#> Error: difference(x = as.POSIXct("2017-01-01", "UTC"), y = as.POSIXct("2016-01-01", "UTC"))
#> 1) FALSE: is.numeric(x)
#> 2) FALSE: is.numeric(y)
With localize
, you can concentrate a globally applied check formula to
specific expressions. The result is a reusable custom checker:
chk_numeric <- localize("Not numeric" ~ is.numeric)
secant <- firmly(function(f, x, h) (f(x + h) - f(x)) / h, chk_numeric(~x, ~h))
secant(sin, 0, .1)
#> [1] 0.9983342
secant(sin, "0", .1)
#> Error: secant(f = sin, x = "0", h = 0.1)
#> Not numeric: x
(In fact, chk_numeric
is equivalent to the pre-built checker vld_numeric
.)
Conversely, apply globalize
to impose your localized checker globally:
difference <- firmly(function(x, y) x - y, globalize(chk_numeric))
difference(3, 1)
#> [1] 2
difference(as.POSIXct("2017-01-01", "UTC"), as.POSIXct("2016-01-01", "UTC"))
#> Error: difference(x = as.POSIXct("2017-01-01", "UTC"), y = as.POSIXct("2016-01-01", "UTC"))
#> 1) Not numeric: `x`
#> 2) Not numeric: `y`
assertive,
assertthat, and
checkmate provide extensive collections
of predicate functions that you can use in conjunction with firmly
and
localize
.
ensurer and assertr provide ways to validate function values.
argufy takes a different approach to input validation, using roxygen comments to specify checks.
ensurer provides an experimental
replacement for function
that builds functions with type-validated
arguments.
typeCheck, together with Types for R, enable the creation of functions with type-validated arguments by means of special type annotations. This approach is orthogonal to that of valaddin: whereas valaddin specifies input checks as predicate functions with scope (predicates are primary), typeCheck specifies input checks as arguments with type (arguments are primary).