Out-of-bag predictions and evaluation

Out-of-bag predictions and error

Each tree in the random forest can make predictions for its out-of-bag data, and the out-of-bag predictions can be aggregated to make an ensemble out-of-bag prediction. Since the out-of-bag data are not used to grow the tree, the accuracy of the ensemble out-of-bag predictions approximate the generalization error of the random forest. Out-of-bag prediction error plays a central role for some routines that estimate variable importance, e.g. negation importance.

Let’s fit an oblique random survival forest and plot the distribution of the ensemble out-of-bag predictions.


fit <- orsf(data = pbc_orsf, 
            formula = Surv(time, status) ~ . - id,
            oobag_pred_horizon = 3500)

hist(fit$pred_oobag, 
     main = 'Ensemble out-of-bag survival predictions at t=3,500')

Not surprisingly, all of the survival predictions are between 0 and 1. Next, let’s check the out-of-bag accuracy of fit:


# what function is used to evaluate out-of-bag predictions?
fit$eval_oobag$stat_type
#> [1] "Harrell's C-statistic"

# what is the output from this function?
fit$eval_oobag$stat_values
#>           [,1]
#> [1,] 0.8419983

The out-of-bag estimate of Harrell’s C-statistic (the default method to evaluate out-of-bag predictions) is 0.8419983.

Monitoring out-of-bag error

As each out-of-bag data set contains about one-third of the training set, the out-of-bag error estimate usually converges to a stable value as more trees are added to the forest. If you want to monitor the convergence of out-of-bag error for your own oblique random survival forest, you can set oobag_eval_every to compute out-of-bag error at every oobag_eval_every tree. For example, let’s compute out-of-bag error after fitting each tree in a forest of 50 trees:


fit <- orsf(data = pbc_orsf,
            formula = Surv(time, status) ~ . - id,
            n_tree = 50,
            oobag_pred_horizon = 3500,
            oobag_eval_every = 1)

plot(
 x = seq(1, 50, by = 1),
 y = fit$eval_oobag$stat_values, 
 main = 'Out-of-bag C-statistic computed after each new tree is grown.',
 xlab = 'Number of trees grown',
 ylab = fit$eval_oobag$stat_type
)

In general, at least 500 trees are recommended for a random forest fit. We’re just using 50 in this case for better illustration of the out-of-bag error curve. Also, it helps to make run-times low whenever I need to re-compile the package vignettes.

User-supplied out-of-bag evaluation functions

In some cases, you may want to use your own function to compute out-of-bag error. For example, here is a simple (and incorrect) way to compute the Brier score. (It is incorrect because it does not account for censoring)


oobag_fun_brier <- function(y_mat, s_vec){

 # risk = 1 - survival 
 r_vec <- 1 - s_vec

 # mean of the squared differences between predicted and observed risk
 mean( (y_mat[, 'status'] - r_vec)^2 )
 
}

There are two ways to apply your own function to compute out-of-bag error. First, you can apply your function to the out-of-bag survival predictions that are stored in ‘aorsf’ objects, e.g:


oobag_fun_brier(y_mat = fit$data[, c('time', 'status')],
                s_vec = fit$pred_oobag)
#> [1] 0.1921042

Second, you can pass your function into orsf(), and it will be used in place of Harrell’s C-statistic:


fit <- orsf(data = pbc_orsf,
            formula = Surv(time, status) ~ . - id,
            n_tree = 50,
            oobag_pred_horizon = 3500,
            oobag_fun = oobag_fun_brier,
            oobag_eval_every = 1)

plot(
 x = seq(1, 50, by = 1),
 y = fit$eval_oobag$stat_values, 
 main = 'Out-of-bag error computed after each new tree is grown.',
 sub = 'For the Brier score, lower values indicate more accurate predictions',
 xlab = 'Number of trees grown',
 ylab = "Brier score"
)

Let’s run one more example showing how this can be done using functions from other packages, e.g., survivalROC from the survivalROC package:


oobag_fun_sroc <- function(y_mat, s_vec){

 score <- survivalROC::survivalROC(
  Stime = y_mat[, 'time'],
  status = y_mat[, 'status'],
  # risk = 1 - survival
  marker = 1 - s_vec,
  # important!! Make sure this matches the time you used in orsf
  predict.time = 3500,
  # nearest neighbor estimation for censoring
  method = "NNE",
  # value taken from ?survivalROC examples
  span = 0.25 * nrow(y_mat)^(-0.20)
 )
 
 # oobag_fun needs to return a numeric value of length 1
 score$AUC

}

fit <- orsf(data = pbc_orsf,
            formula = Surv(time, status) ~ . - id,
            n_tree = 50,
            oobag_pred_horizon = 3500,
            oobag_fun = oobag_fun_sroc,
            oobag_eval_every = 1)

plot(
 x = seq(50),
 y = fit$eval_oobag$stat_values, 
 main = 'Out-of-bag time-dependent AUC\ncomputed after each new tree is grown.',
 xlab = 'Number of trees grown',
 ylab = "AUC at t = 3,500"
)

Specific instructions on user-supplied functions

User-supplied functions must:

have exactly two arguments named y_mat and s_vec.
return a numeric output of length 1

If either of these conditions is not true, an error will occur. A simple test to make sure your user-supplied function will work with the aorsf package is below:


# Helper code to make sure your oobag_fun function will work with aorsf

# time and status values
test_time <- seq(from = 1, to = 5, length.out = 100)
test_status <- rep(c(0,1), each = 50)

# y-matrix is presumed to contain time and status (with column names)
y_mat <- cbind(time = test_time, status = test_status)
# s_vec is presumed to be a vector of survival probabilities
s_vec <- seq(0.9, 0.1, length.out = 100)

# see 1 in the checklist above
names(formals(oobag_fun_sroc)) == c("y_mat", "s_vec")
#> [1] TRUE TRUE

test_output <- oobag_fun_sroc(y_mat = y_mat, s_vec = s_vec)

# test output should be numeric
is.numeric(test_output)
#> [1] TRUE
# test_output should be a numeric value of length 1
length(test_output) == 1
#> [1] TRUE

User-supplied functions for negation importance.

Negation importance is based on the out-of-bag error, so of course you may be curious about what negation importance would be if it were computed using different statistics. The workflow for doing this is exactly the same as the example above, except we have to specify importance = 'negate' when we fit our model. Also, to speed up computations, I am not going to monitor out-of-bag error here.


fit_sroc <- orsf(data = pbc_orsf,
                 formula = Surv(time, status) ~ . - id,
                 n_tree = 50,
                 oobag_pred_horizon = 3500,
                 oobag_fun = oobag_fun_sroc,
                 importance = 'negate')

fit_sroc$importance
#>       protime          bili        copper      alk.phos         stage 
#>  0.0651135074  0.0449999962  0.0194099922  0.0193764812  0.0185630139 
#>         sex_f           ast          trig      hepato_1     ascites_1 
#>  0.0163615399  0.0130692067  0.0059584829  0.0049877856  0.0046248488 
#>       edema_1     spiders_1   trt_placebo     edema_0.5       albumin 
#>  0.0033905063  0.0025556974 -0.0001527407 -0.0020913109 -0.0084313113 
#>          chol      platelet           age 
#> -0.0116799262 -0.0140180095 -0.0160896260

I really need to use package X

You can use any package whatsoever to evaluate out-of-bag predictions. You can access the final out-of-bag survival predictions from an aorsf model like so:


pred_oobag <- fit$pred_oobag

pred_oobag[1:5, ]
#> [1] 0.005882353 0.386951908 0.232629282 0.240005660 0.315733470

Some notes to remember when evaluating out-of-bag error with pred_oobag:

the oobag_pred_horizon input in orsf() determines the prediction horizon for out-of-bag predictions. The prediction horizon is a critical input for evaluation of predictions with time-to-event outcomes.
Some functions expect predicted risk (i.e., 1 - predicted survival), others expect predicted survival.

In most cases, you should also be able to use any package whatsoever to compute negation importance. One exception at this point is riskRegression, one of my favorite packages. I have experimented with riskRegression but found its functions do not work as I would expect them to when I try to run them from C++. I think this may be due to riskRegression’s internal use of data.table and modification by reference, but I do not have certainty on that yet.