{sperrorest} is a generic framework which aims to work with all R models/packages. In statistical learning, model setups, their formulas and error measures all depend on the family of the response variable. Various families exist (numeric, binary, multiclass) which again include sub-families (e.g. gaussian or poisson distribution of a numeric response).
This detail needs to be specified via the respective function, e.g. when using glm()
with a binary response, one needs to set family = "binomial"
to make sure that the model does something meaningful. Most of the time, the same applies to the generic predict()
function. For the glm()
case, one would need to set type = "response"
if the predicted values should reflect probabilities instead of log-odds.
These settings can be specified using model_args
and pred_args
in sperrorest()
. So fine, “why do we need to write all these wrappers and custom model/predict functions then?!”
model_fun
expects at least formula argument and a data.frame with the learning sample. All arguments, including the additional ones provided via model_args
, are getting passed to model_fun
via a do.call()
call. However, if model_fun
does not have an argument named formula
but e.g. fixed
(like it is the case for glmmPQL()
) the do.call()
call will fail because sperrorest()
tries to pass an argument named formula
but glmmPQL
expects an argument named fixed
.
In this case, we need to write a wrapper function for glmmPQL
(named glmmPQL_modelfun
here) which accounts for this naming problem. Here, we are passing the formula
argument to our custom model function which then does the actual call to glmmPQL()
using the supplied formula
object as the fixed
argument of glmmPQL
. By default, glmmPQL()
has further arguments like family
or random
. If we want to use these, we pass them to model_args
which then appends these to the arguments of glmmPQL_modelfun
.
<- function(formula = NULL, data = NULL, random = NULL,
glmmPQL_modelfun family = NULL) {
<- glmmPQL(fixed = formula, data = data, random = random, family = family)
fit return(fit)
}
Unless specified explicitly, sperrorest()
tries to use the generic predict()
function. This function works differently depending on the class of the provided fitted model, i.e. many models slightly differ in the naming (and availability) of their arguments. For example, when fitting a Support Vector Machine (SVM) with a binary response variable, package kernlab
expects an argument type = "probabilities"
in its predict()
call to receive predicted probabilities while in package e1071
it is "probability = TRUE"
. Similar to model_args
, this can be accounted for in the pred_args
of sperrorest()
.
However, sperrorest()
expects that the predicted values (of any response type) are stored directly in the returned object of the predict()
function. While this is the case for many models, mainly with a numeric response, classification cases often behave differently. Here, the predicted values (classes in this case) are often stored in a sub-object named class
or predicted
.
Since there is no way to account for this in a general way (when every package may return the predicted values in a different format/column), we need to account for it by providing a custom predict function which returns only the predicted values so that sperrorest()
can continue properly. This time we are showing two examples. The first takes again a binary classification using randomForest
.
When calling predict on a fitted randomForest
model with a binary response variable, the predicted values are actually stored in the resulting object returned by predict()
(here called pred
). So why do we have trouble here then?
Simply because pred
is a matrix containing both probabilities for the FALSE
(= 0) and TRUE
(= 1) case. sperrorest()
needs a vector containing only the predicted values of the TRUE
case to pass these further onto err_fun()
which then takes care of calculating all the error measures. So the important part is to subset the resulting matrix in the pred
object to TRUE
cases only and return the result.
<- function(object = NULL, newdata = NULL, type = NULL) {
rf_predfun <- predict(object = object, newdata = newdata, type = type)
pred <- pred[, 2]
pred }
The same case (binary response) using svm
from the e1071
package. Here, the predicted probabilities are stored in a sub-object of pred
. We can address it using the attr()
function. Then again, we only need the TRUE
cases for sperrorest()
.
<- function(object = NULL, newdata = NULL, probability = NULL) {
svm_predfun <- predict(object, newdata = newdata, probability = TRUE)
pred <- attr(pred, "probabilities")[, 2]
pred }