cv_vim
handles an odd number of outer folds being passed with pre-computed regression function estimates. Now, you can use an odd number of folds (e.g., 5) to estimate the full and reduced regression functions and still obtain cross-validated variable importance estimates.vrc01
data as an exported objectvrc01
dataC
to not be specified in make_folds
None
measure_auc
to hew more closely to ROCR
and cvAUC
, using computational tricks to speed up weighted AUC and EIF computation.cross_fitted_se
to cv_vim
and sp_vim
; this logical option allows the standard error to be estimated using cross-fitting. This can improve performance in cases where flexible algorithms are used to estimate the full and reduced regressions.vim
and cv_vim
; currently, this option is only available for non-sampled-split calls (i.e., with sample_splitting = FALSE
)vim
are based on the entire dataset, while the full and reduced predictiveness (predictiveness_full
and predictiveness_reduced
, along with the corresponding confidence intervals) is evaluated using separate portions of the data for the full and reduced regressions.sample_splitting
to vim
, cv_vim
and sp_vim
; if FALSE
, sample-splitting is not used to estimate predictiveness. Note that we recommend using the default, TRUE
, in all cases, since inference using sample_splitting = FALSE
will be invalid for variables with truly null variable importance.sample_splitting = TRUE
to match more closely with theoretical results (and improve power!). In this case, we first split the data into \(2K\) cross-fitting folds, and split these folds equally into two sample-splitting folds. For the nuisance regression using all covariates, for each \(k \in \{1, \ldots, K\}\) we set aside the data in sample-splitting fold 1 and cross-fitting fold \(k\) [this comprises \(1 / (2K)\) of the data]. We train using the remaining observations [comprising \((2K-1)/(2K)\) of the data] not in this testing fold, and we test on the originally withheld data. We repeat for the nuisance regression using the reduced set of covariates, but withhold data in sample-splitting fold 2. This update affects both cv_vim
and sp_vim
. If sample_splitting = FALSE
, then we use standard cross-fitting.>=
in computing the numerator of AUC with inverse probability weightsroxygen2
documentation for wrappers (vimp_*
) to inherit parameters and details from cv_vim
(reduces potential for documentation mismatches)None
family
if it isn’t specified; use stats::binomial()
if there are only two unique outcome values, otherwise use stats::gaussian()
None
cvAUC
)cvAUC
None
ipc_est_type
(available in vim
, cv_vim
, and sp_vim
; also corresponding wrapper functions for each VIM and corresponding internal estimation functions)None
None
testthat/
to use glm
rather than xgboost
(increases speed)glm
rather than xgboost
or ranger
(increases speed, even though the regression is now misspecified for the truth)forcats
from vignetteNone
measure_accuracy
and measure_auc
for project-wide consistencytestthat/
to not explicitly load xgboost
None
None
stats::qlogis
and stats::plogis
rather than bespoke functionsNone
None
vimp
will handle the rest.vimp
”run_regression = TRUE
for simplicityverbose
to sp_vim
; if TRUE
, messages are printed throughout fitting that display progress and verbose
is passed to SuperLearner
cv_predictiveness_point_est
and predictiveness_point_est
to est_predictiveness_cv
and est_predictiveness
, respectivelycv_predictiveness_update
, cv_vimp_point_est
, cv_vimp_update
, predictiveness_update
, vimp_point_est
, vimp_update
; this functionality is now in est_predictiveness_cv
and est_predictiveness
(for the *update*
functions) or directly in vim
or cv_vim
(for the *vimp*
functions)predictiveness_se
and predictiveness_ci
(functionality is now in vimp_se
and vimp_ci
, respectively)weights
argument to ipc_weights
, clarifying that these weights are meant to be used as inverse probability of coarsening (e.g., censoring) weightsAdded functions sp_vim
, sample_subsets
, spvim_ics
, spvim_se
; these allow computation of Shapely Population Variable Importance (SPVIM)
None
sp_vim
and helper functions run_sl
, sample_subsets
, spvim_ics
, spvim_se
; these will be added in a future releasecv_vim_nodonsker
, since cv_vim
supersedes this functionsp_vim
and helper functions run_sl
, sample_subsets
, spvim_ics
, spvim_se
; these functions allow computation of the Shapley Population Variable Importance Measure (SPVIM)cv_vim
and vim
now use an outer layer of sample splitting for hypothesis testingvimp_auc
, vimp_accuracy
, vimp_deviance
, vimp_rsquared
vimp_regression
is now deprecated; use vimp_anova
insteadvim
; each variable importance function is now a wrapper function around vim
with the type
argument filled incv_vim_nodonsker
is now deprecated; use cv_vim
insteadvimp_anova
)vimp_anova
)None
gam
package update by switching library to SL.xgboost
, SL.step
, and SL.mean
None
gam
package update in unit testsNone
cv_vim
andcv_vim_nodonsker
now return the cross-validation folds used within the functionNone
family
for the top-level SuperLearner if run_regression = TRUE
; in call cases, the second-stage SuperLearner uses a gaussian
familySL.mean
as the best-fitting algorithm, the second-stage regression is now run using the original outcome, rather than the first-stage fitted valuescv_vim_nodonsker
, which computes the cross-validated naive estimator and the update on the same, single, validation fold. This does not allow for relaxation of the Donsker class conditions.None
two_validation_set_cv
, which sets up folds for V-fold cross-validation with two validation sets per foldcv_vim
: now, the cross-validated naive estimator is computed on a first validation set, while the update for the corrected estimator is computed using the second validation set (both created from two_validation_set_cv
); this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimator, while making sure that the initial CV naive estimator is not biased high (due to a higher R^2 on the training data)None
None
cv_vim
: now, the cross-validated naive estimator is computed on the training data for each fold, while the update for the corrected cross-validated estimator is computed using the test data; this allows for relaxation of the Donsker class conditions necessary for asymptotic convergence of the corrected estimatorvim
, replaced with individual-parameter functionsvimp_regression
to match Python packagecv_vim
now can compute regression estimatorsvimp_ci
, vimp_se
, vimp_update
, onestep_based_estimator
None
Bugfixes etc.