By far the most common use of assertions is for checking input to functions. Consider this function for calculating the geometric mean:
geomean <- function(x, na.rm = FALSE)
{
exp(mean(log(x), na.rm = na.rm))
}
In a statically-typed language, we could enforce x
being a numeric vector. R’s dynamic typing (while mostly helping us be more productive) gives us some rope to hang ourselves with: x
and na.rm
can be absolutely anything. We need to handle the cases when x
is not numeric, or when x contains negative values, or when na.rm
is not a single logical value.
The built-in functions exp
, mean
and log
have some of their own logic for handling bad inputs, and it is possible to simply rely on that logic rather than writing your own. For demonstration purposes, let’s see how they behave, and then see if we can improve upon it.
If we pass a non-numeric x
, we see:
geomean("a")
## Error in log(x): non-numeric argument to mathematical function
The error message is OK, but it since it is appearing to come from log(x)
, it isn’t totally clear to the user where the problem originates. The assertive fix isto include assert_is_numeric(x)
in the function.
Where should this line go? In accordance with the first clause of the programming principle “fail early, fail often”, the assertion belongs at the start of the function.
geomean2 <- function(x, na.rm = FALSE)
{
assert_is_numeric(x)
exp(mean(log(x), na.rm = na.rm))
}
geomean2("a")
## Error in geomean2("a"): is_numeric : x is not of class 'numeric'; it has class 'character'.
The geometric mean doesn’t make any mathematical sense for (real) negative numbers, and will return NaN
if the input contains any.
geomean(rnorm(20))
## Warning in log(x): NaNs produced
## [1] NaN
The warning here is not so informative (why were the NaNs produced?), and appears to come from log(x)
. We could be strict and throw an error if there are negative values by adding a call to assert_all_are_non_negative
. To replicate the base-R behaviour we can define custom actions based upon the result of is_non_negative
:
geomean3 <- function(x, na.rm = FALSE)
{
assert_is_numeric(x)
if(!all(is_non_negative(x), na.rm = TRUE)) # Don't worry about NAs here
{
warning("x contains negative values, so the geometric mean makes no sense.")
return(NaN)
}
exp(mean(log(x), na.rm = na.rm))
}
geomean3(rnorm(20))
## Warning in geomean3(rnorm(20)): x contains negative values, so the geometric
## mean makes no sense.
## [1] NaN
For na.rm
, the mean
function coerces its input to be a logical value, warning if the value’s length is more than one.
x <- rlnorm(20)
x[sample(20, 5)] <- NA
geomean(x, c(1.5, 0))
## Warning in if (na.rm) x <- x[!is.na(x)]: the condition has length > 1 and only
## the first element will be used
## [1] 1.428326
The warning about the length is OK, but again its source is not totally clear for users. The coercion to logical happens silently, which isn’t ideal.
Again, we could be strict and throw an error when na.rm
isn’t a scalar logical value using assert_is_a_bool
. (This is a compound assertion checking both the type and the length of the object). In this case, to replicate the base-R behaviour, we will use some utility functions provided by assertive.
geomean4 <- function(x, na.rm = FALSE)
{
assert_is_numeric(x)
if(!all(is_non_negative(x), na.rm = TRUE)) # Don't worry about NAs here
{
warning("x contains negative values, so the geometric mean makes no sense.")
return(NaN)
}
na.rm <- coerce_to(use_first(na.rm), "logical")
exp(mean(log(x), na.rm = na.rm))
}
use_first
takes the first element of an object, warning if it has length more than one. coerce_to
checks and object’s class
, then converts it to the requested type, with a warning, using an appropriate as.*
function if it exists, or as
if it doesn’t.
geomean4(x, c(1.5, 0))
## Warning: Only the first value of na.rm (= 1.5) will be used.
## Warning: Coercing use_first(na.rm) to class 'logical'.
## [1] 1.428326
Whether to throw an error on bad input or fix it is personal preference and depends upon context. For end-user functions should should usually try to be flexible and fix things unless they’ve done something very silly. For lower-level functions, you can often afford to be stricter.
assert_
functions to throw errors for inputs that you can’t salvage.is_
functions in an if
condition to provide custom behaviour.use_first
and coerce_to
where possible.mad
function in the stats
package calculates the median absolute deviation. Type mad
to see its contents, and run example(mad)
to get a feel for how it works.fixInNamespace
if you are feeling fancy) to include some assertions checking the inputs. Hint: Some of the inputs should be numeric; others should be logical. Some should be only allow a single value. [15 mins].