The following steps are described in the basic-use vignette, refer to it if the following code is not clear:
library(SDMtune)
library(zeallot)
# Prepare data
<- list.files(path = file.path(system.file(package = "dismo"), "ex"),
files pattern = "grd", full.names = TRUE)
<- raster::stack(files)
predictors <- prepareSWD(species = "Virtual species", p = virtualSp$presence,
data a = virtualSp$background, env = predictors,
categorical = "biome")
c(train, test) %<-% trainValTest(data, test = 0.2, only_presence = TRUE,
seed = 25)
# Train model
<- train("Maxent", data = train)
model
# Train cross validation model
<- randomFolds(data, k = 4, only_presence = TRUE, seed = 25)
folds <- train("Maxent", data = data, folds = folds) cv_model
For a Maxent model we can get the variable importance values from the output of the MaxEnt Java software. These values are stored in the model object and can be displayed using the following command:
@model@results model
The function maxentVarImp
extracts the variable
importance values from the previous output and formats them in a more
human readable way:
<- maxentVarImp(model)
vi vi
As you can see the function returns a data.frame
with
the variable name, the percent contribution and the permutation
importance.
You can plot the variable importance as a bar chart using the
function plotVarImp
. For example you can plot the percent
contribution using:
plotVarImp(vi[, 1:2])
or the permutation importance with:
plotVarImp(vi[, c(1,3)])
SDMtune has its own function to compute the permutation importance that iterates through several permutations and return an averaged value together with the standard deviation. We will use this function to compute the permutation importance of a Maxnet model.
For this example we train a Maxnet model:
<- train("Maxnet", data = train) maxnet_model
Now we can calculate the variable importance with the function
varImp()
using 5 permutations:
<- varImp(maxnet_model, permut = 5)
vi_maxnet vi_maxnet
And plot it with:
plotVarImp(vi_maxnet)
Next we compute the permutation importance for the Maxent model using 10 permutations and compare the results with the Maxent output:
# Compute the permutation importance
<- varImp(model, permut = 10)
vi_maxent # Print it
vi_maxent# Compare with Maxent output
maxentVarImp(model)
The difference is probably due to a different shuffling of the presence and background locations during the permutation process and because in this example we performed 10 permutations and averaged the values.
Another method to estimate the variable importance is the leave one
out Jackknife test. The test removes one variable at time and records
the change in the chosen metric. We use the function doJk
,
the AUC as evaluation metric and the maxnet_model
:
<- doJk(maxnet_model, metric = "auc", test = test)
jk jk
We can also plot the output using the function plotJk
.
In the following example we plot the previous result and we add a line
representing the AUC of the full model trained using all the variables.
First we plot the Jackknife test for the training AUC:
plotJk(jk, type = "train", ref = auc(maxnet_model))
and the Jackknife test for the testing AUC:
plotJk(jk, type = "test", ref = auc(maxnet_model, test = test))
With the function plotResponse
is possible to plot the
marginal and the univariate response curve. Let’s plot the
cloglog univariate response curve of
bio1:
plotResponse(maxnet_model, var = "bio1", type = "cloglog", only_presence = TRUE,
marginal = FALSE, rug = TRUE)
On top is displayed the rug of the presence locations and on bottom the rug of the background locations. As another example we can plot the logistic marginal response curve of biome that is a categorical variable, keeping the other variables at the mean value:
plotResponse(maxnet_model, var = "biome", type = "logistic",
only_presence = TRUE, marginal = TRUE, fun = mean, color = "blue")
In the case of an SDMmodelCV
the response curve shows
the averaged value of the prediction together with one Standard
Deviation error interval:
plotResponse(cv_model, var = "bio1", type = "cloglog", only_presence = TRUE,
marginal = TRUE, fun = mean, rug = TRUE)
All what you have learned till now con be saved and summarized
calling the function modelReport
. The function will:
.Rds
extension that can be loaded
in R using the readRDS
function.The function is totally inspired by the default output of the MaxEnt
Java software (Phillips, Anderson, and Schapire
2006) and extends it to other methods. You can decide what to
include in the report using dedicated function arguments, like
response_curves
, jk
and env
but
the function cannot be used with SDMmodelCV
objects. Run
the following code to create a report of the Maxnet
model we trained before:
modelReport(maxnet_model, type = "cloglog", folder = "virtual-sp", test = test,
response_curves = TRUE, only_presence = TRUE, jk = TRUE,
env = predictors)
The output is displayed in the browser and all the files are saved in the virtual-sp folder.
To explore correlation among the variables we extract 10000
background locations using the functionrandomPoints
included in the dismo
package (we set the seed to have
reproducible results). After we create an SWD
object using
the prepareSWD
function:
set.seed(25)
<- dismo::randomPoints(predictors, 10000)
bg <- prepareSWD(species = "Bgs", a = bg, env = predictors,
bg categorical = "biome")
The environmental variables we downloaded have a coarse resolution and the function can extract a bit less than 10000 random locations (see the warning message).
With the function plotCor
you can create an heat map
showing the degree of autocorrelation:
plotCor(bg, method = "spearman", cor_th = 0.7)
You can select a different correlation method or set a different
correlation threshold. Another useful function is corVar
that instead of creating a heat map prints the pairs of correlated
variables according to the given method and correlation threshold:
corVar(bg, method = "spearman", cor_th = 0.7)
As you can see there are few variables that have a correlation coefficient greater than 0.7 in absolute value.
There are cases in which a model has some environmental variables
ranked with very low contribution and you may want to remove some of
them to reduce the model complexity. SDMtune offers two
different strategies implemented in the function reduceVar
.
We will use the Maxent model trained with all the
variables. Let’s first check the permutation importance (we use only one
permutation to save time):
varImp(model, permut = 1)
In the first example we want to remove all the environmental variables that have a permutation importance lower than 6%, no matter if the model performance decreases. The function removes the last ranked environmental variable, trains a new model and computes a new rank. The process is repeated until all the remaining environmental variables have an importance greater than 6%:
cat("Testing AUC before: ", auc(maxnet_model, test = test))
<- reduceVar(maxnet_model, th = 6, metric = "auc",
reduced_variables_model test = test, permut = 1)
cat("Testing AUC after: ", auc(reduced_variables_model, test = test))
In the second example we want to remove the environmental variables that have a permutation importance lower than 15% only if removing the variables the model performance does not decrease, according to the given metric. In this case the function performs a leave one out Jackknife test and remove the environmental variables in a step-wise fashion as described in the previous example, but only if the model performance doesn’t drop:
cat("Testing AUC before: ", auc(maxnet_model, test = test))
<- reduceVar(maxnet_model, th = 15, metric = "auc",
reduced_variables_model test = test, permut = 1, use_jk = TRUE)
cat("Testing AUC after: ", auc(reduced_variables_model, test = test))
As you can see in this case several variables have been removed and the AUC in the testing dataset didn’t decrease.