rfPermute
estimates the significance of importance metrics for a Random Forest model by permuting the response variable. It will produce null distributions of importance metrics for each predictor variable and p-values of observed importances. The package also includes several summary and visualization functions for randomForest
and rfPermute
results. See rfPermuteTutorial()
in the package for a guide on running, summarizing, and diagnosing rfPermute
and randomForest
models.
To install the stable version from CRAN:
To install the latest version from GitHub:
# make sure you have devtools installed
if (!require('devtools')) install.packages('devtools')
# install from GitHub
devtools::install_github('EricArcher/rfPermute')
rfPermute
Estimate Permutation p-values for Random Forest Importance Metricsimportance
Extract rfPermute Importance Scores and p-valuesplotNull
Plot Random Forest Importance Null DistributionsplotImpPreds
Distribution of Important Variablessummary
Summarize rfPermute and randomForest modelsconfusionMatrix
Confusion MatrixcasePredictions
Return predictions and votes for training casespctCorrect
Percent Correctly ClassifiedplotInbag
Distribution of sample inbag ratesplotPredictedProbs
Distribution of prediction assignment probabilitiesplotProximity
Plot Random Forest Proximity ScoresplotTrace
Trace of cumulative error rates in forestplotVotes
Vote DistributioncombineRP
Combine rfPermute modelsbalancedSampsize
Balanced Sample SizecleanRFdata
Clean Random Forest Input Datapct.correct
argument to plotTrace()
. Default is now to have y-axis as 1 - OOB error rate.NOTE: v2.5 is a large redevelopment of the package. The structure of rfPermute model objects has changed make them incompatible with previous versions. Also, the name and functionality of several functions has changed to make them more consistent with one another. A tutorial (under construction) is available within the package as rfPermuteTutorial()
.
exptdErrRate
threshold
argument in classConfInt
and confusionMatrix
to NULL
exptdErrRate
and confusionMatrix
pctCorrect
casePredictions
plotConfMat
, plotOOBtimes
, plotRFtrace
, and plotInbag
, and plotImpVarDist
visualizations.confusionMatrix
so it will work when randomForest
model doesn’t have a $confusion
element, like when model is result of combine
-ing multiple models.num.cores
to NULL
.type
argument to plotVotes
to choose between area and bar charts.plot.rfPermute
to plotNull
to avoid clashes and maintain functionality of randomForest::plot.randomForest
.proximity.plot
to proximityPlot
, exptd.err.rate
to exptdErrRate
, and clean.rf.data
to cleanRFdata
to make camelCase naming scheme more consistent in package.plotNull
from base graphics to ggplot2.symb.metab
data set.n
argument to impHeatmap
.classConfInt
, confusionMatrix
, plotVotes
, pctCorrect
.plot.rfPermute
that was reporting the p-value incorrectly at the top of the figure.rfPermute
so it works on Windows too.impHeatmap
function.proximity.plot
to use ggplot2
graphics.rfPemute
has separate $null.dist
and $pval
elements, each with results for unscaled and scaled importance mesures. See ?rfPermute
for more information.rp.importance
and plot.rfPermute
now take a scale
argument to specify whether or not importance values should be scaled by standard deviations.nrep = 0
for rfPermute
, a randomForest
object is returned.grid
name clashes.clean.rf.data
where fixed predictors were not removed.main
argument in plot.rp.importance
.num.cores
argument to rfPermute
to take advantage of multi-threadingcalc.imp.pval
to keep it from indexing