Load data into R. The last argument of PhecapData
, 0.4, refers to the percentage of labels reserved as test set.
Specify the surrogate used for surrogate-assisted feature extraction (SAFE). The typical way is to specify a main ICD code, a main NLP CUI, as well as their combination. In some cases one may want to define surrogate through lab test. The default lower_cutoff is 1, and the default upper_cutoff is 10. Feel free to change the cutoffs based on domain knowledge.
surrogates <- list(
PhecapSurrogate(
variable_names = "main_ICD",
lower_cutoff = 1, upper_cutoff = 10),
PhecapSurrogate(
variable_names = "main_NLP",
lower_cutoff = 1, upper_cutoff = 10),
PhecapSurrogate(
variable_names = c("main_ICD", "main_NLP"),
lower_cutoff = 1, upper_cutoff = 10))
Run surrogate-assisted feature extraction (SAFE) and show result.
Train phenotyping model and show the fitted model, with the AUC on the training set as well as random splits.
Validate phenotyping model using validation label, and show the AUC and ROC.
validation <- phecap_validate_phenotyping_model(data, model)
validation
phecap_plot_roc_curves(validation)
Apply the model to all the patients to obtain predicted phenotype.