The survex
package strives to include the functionality
for the automatic creation of explainers for as many survival models as
possible. However, it is impossible to include them all because of the
great variety of available packages and models. We provide the
functionality to create an explainer for any survival model
manually.
In the best case, creating an explainer for your desired model is
already implemented. This means that everything can be extracted from
the model object (sometimes, you need to set additional parameters). For
example, this is the case for the proportional hazards model from the
survival
package:
library(survex)
library(survival)
<- coxph(Surv(time, status) ~ ., data = veteran, model = TRUE, x = TRUE)
cph
<- explain(cph)
auto_cph_explainer #> Preparation of a new explainer is initiated
#> -> model label : coxph ( [33m default [39m )
#> -> data : 137 rows 6 cols ( extracted from the model )
#> -> target variable : 137 values ( 128 events and 9 censored ) ( extracted from the model )
#> -> times : 94 unique time points , min = 1 , mean = 119.9706 , max = 845.56
#> -> times : ( generated from y with method quantiles )
#> -> predict function : predict.coxph with type = 'risk' will be used ( [33m default [39m )
#> -> predict survival function : predictSurvProb.coxph will be used ( [33m default [39m )
#> -> predict cumulative hazard function : -log(predict_survival_function) will be used ( [33m default [39m )
#> -> model_info : package survival , ver. 3.3.1 , task survival ( [33m default [39m )
#> A new explainer has been created!
It can be seen that the only required parameter of the
explain()
function is the proportional hazards model
itself. However, we needed to set model = TRUE, x = TRUE
while creating it. If you forget to set these arguments, an error will
be shown.
The next base case is when all types of predictions can be made using
your desired model. It is then possible to manually create an explainer
using the explain_survival()
function. Let’s look at an
example - how to set all parameters of a coxph
explainer by
hand:
<- coxph(Surv(time, status) ~ ., data=veteran)
cph
# data must not include the target columns
<- veteran[, -c(3,4)]
veteran_data <- Surv(veteran$time, veteran$status)
veteran_y
# set prediction functions of the required format
<- function(model, newdata) predict(model, newdata, type = "risk")
risk_pred <- function(model, newdata, times) pec::predictSurvProb(model, newdata, times)
surv_pred <- function(model, newdata, times) -log(surv_pred(model, newdata, times))
chf_pred
<- explain_survival(model = cph,
manual_cph_explainer data = veteran_data,
y = veteran_y,
predict_function = risk_pred,
predict_survival_function = surv_pred,
predict_cumulative_hazard_function = chf_pred,
label="manual coxph")
#> Preparation of a new explainer is initiated
#> -> model label : manual coxph
#> -> data : 137 rows 6 cols
#> -> target variable : 137 values ( 128 events and 9 censored )
#> -> times : 94 unique time points , min = 1 , mean = 119.9706 , max = 845.56
#> -> times : ( generated from y with method quantiles )
#> -> predict function : risk_pred
#> -> predict survival function : surv_pred
#> -> predict cumulative hazard function : chf_pred
#> -> model_info : package survival , ver. 3.3.1 , task survival ( [33m default [39m )
#> A new explainer has been created!
Sometimes, the survival model provides only one type of prediction, for example, the survival function. This package provides helpful utilities for converting between types of prediction and standardizing them for explainer creation.
For example, the random survival forest from the
randomForestSRC
package only provides the survival function
predictions at the unique times from the training data set
(predict.rsfrc()$survival
). Let’s try to create an
explainer for it using the utility function for converting this type of
prediction to a step function that can be used for prediction at any
time points.
<- transform_to_stepfunction(predict,
surv_pred_rsf type="survival",
prediction_element = "survival",
times_element = "time.interest")
Because the predict.rsfrc()
returns a list with the
times at which the survival function was evaluated, we provide the name
of the list element which contains the prediction and the evaluation
times. If in your case, the prediction function returns only a matrix,
you can pass the times at which the predictions were evaluated in the
eval_times
argument.
We could use the same utility to get the cumulative hazard predictions (commented code below). Still, to demonstrate another function, we make use of the mathematical relationship between the survival function (\(\hat{S}\)) and the cumulative hazard function (\(\hat{H}\)), i.e., \(\hat{H} = \exp(-\hat{S})\).
# would also work
# chf_pred_rsf <- transform_to_stepfunction(predict,
# type="chf",
# prediction_element = "chf",
# times_element = "time.interest")
<- function(model, newdata, times) {
chf_pred_rsf survival_to_cumulative_hazard(surv_pred_rsf(model, newdata, times))
}
The reverse utility (cumulative_hazard_to_survival()
)
also exists.
If no risk prediction is provided for your model, you can use a utility to sum the cumulative hazard function for each observation to achieve risk scores. This approach is recommended by Ishwaran et al. (1) and Sonabend et al. (2).
<- unique(veteran$times)
times <- risk_from_chf(chf_pred_rsf, times) risk_pred_rsf
Now that we have all predictions prepared, we can finally create the explainer:
library(randomForestSRC)
#>
#> randomForestSRC 3.1.0
#>
#> Type rfsrc.news() to see new features, changes, and bug fixes.
#>
<- rfsrc(Surv(time, status) ~ ., data = veteran)
rsf
<- explain_survival(model = rsf,
manual_rsf_explainer data = veteran_data,
y = veteran_y,
predict_function = risk_pred_rsf,
predict_survival_function = surv_pred_rsf,
predict_cumulative_hazard_function = chf_pred_rsf,
label = "manual rsf")
#> Preparation of a new explainer is initiated
#> -> model label : manual rsf
#> -> data : 137 rows 6 cols
#> -> target variable : 137 values ( 128 events and 9 censored )
#> -> times : 94 unique time points , min = 1 , mean = 119.9706 , max = 845.56
#> -> times : ( generated from y with method quantiles )
#> -> predict function : risk_pred_rsf
#> -> predict survival function : surv_pred_rsf
#> -> predict cumulative hazard function : chf_pred_rsf
#> -> model_info : package randomForestSRC , ver. 3.1.0 , task survival ( [33m default [39m )
#> A new explainer has been created!