Abstract
sjSDM is a scalable joint species distribution model based on Monte-Carlo intergration of the joint likelihood.
Load the package via
library(sjSDM)
sjSDM depends on the Anaconda distribution of python and pytorch. You will get a warning if you don’t have python installed. sjSDM depends on Anaconda which will be installed automatically if it is not available.
To install the CPU version, run:
install_sjSDM()
For the GPU version (NVIDIA GPUs are only supported), run:
install_sjSDM(method = "gpu")
More details on and trouble-shooting for installing the necessary dependencies is explained in the vignette dependencies, call
vignette("Dependencies", package = "sjSDM")
To cite sjSDM in a publication, use
citation("sjSDM")
sjSDM has a function to create test data. Here, we create a dataset with 3 environmental predictors, 5 species and 100 sites.
= simulate_SDM(env = 3L, species = 5L, sites = 100L) com
The model is then fit with the function sjSDM(). You have to provide predictors (can be also a data.frame) and response as matrices.
= sjSDM(Y = com$response, env = com$env_weights, iter = 100L, se=TRUE) model
Things you can do with the model output
coef(model)
summary(model)
getCov(model)
sjSDM supports formula description for the predictors.
E.g. interaction with intercept and calculate p-values:
= sjSDM(Y = com$response,
model env = linear(data = com$env_weights, formula = ~ X1*X2,), iter = 50L, se = TRUE)
summary(model)
E.g. quadratic effect without intercept:
= sjSDM(Y = com$response,
model env = linear(data = com$env_weights, formula = ~0+ I(X1^2)), iter = 50L, se = TRUE)
summary(model)
jSDMs account for correlation between species within communities (sites), in real datasets, however, communities (sites) are often additionally correlated (== spatial autocorrelation).
Let’s first simulate test data: 1) Simulate jSDM without a link (normal response)
= simulate_SDM(env = 3L, species = 5L, sites = 100L,
com link = "identical", response = "identical")
= com$env_weights
X = com$response Y
= matrix(rnorm(200), 100, 2)+2
XYcoords = as.matrix(dist(XYcoords))
WW = mvtnorm::rmvnorm( 5L, sigma = exp(-WW)) spatialResiduals
= Y + t(spatialResiduals)
Ysp = ifelse(Ysp < 0, 0, 1) # multivariate probit model Y
Using spatial eigenvectors as additional predictors is a common approach:
= generateSpatialEV(XYcoords)
SPeigen
= sjSDM(Y, env = linear(X, ~.), spatial = linear(SPeigen, ~0+.), iter = 100L)
model summary(model)
Type I analysis of variance for the two/three different components (environment, biotic associations, and space (optional)).
= anova(model)
an plot(an)
We can also visualize the internal meta-community structure (only for models with space):
plot(an, internal = TRUE)
We can also partition the effects of the different components. This
function will be deprecated in future. Please use
plot(anova(model), internal = TRUE)
= importance(model)
imp plot(imp)
sjSDM supports l1 (lasso) and l2 (ridge) regularization: * alpha is the weighting between lasso and ridge * alpha = 0.0 corresponds to pure lasso * alpha = 1.0 corresponds to pure ridge
= sjSDM(Y = com$response, env = linear(data = com$env_weights, formula = ~0+ I(X1^2),lambda = 0.5), iter = 50L)
model summary(model)
We can do the same for the species associations:
= sjSDM(Y = com$response,
model env = linear(data = com$env_weights, formula = ~0+ I(X1^2),lambda = 0.5),
biotic = bioticStruct(lambda =0.1),
iter = 50L)
summary(model)
= sjSDM(Y, env = linear(X, ~X1+X2), spatial = linear(XYcoords, ~0+X1:X2, lambda = 0.4))
model summary(model)
= simulate_SDM(env = 3L, species = 5L, sites = 100L)
com = com$env_weights
X = com$response
Y
# three fully connected layers with relu as activation function
= sjSDM(Y = Y,
model env = DNN(data = X, formula = ~., hidden = c(10L, 10L, 10L), activation = "relu"),
iter = 50L, se = TRUE)
summary(model)
The methods for sjSDM() work also for the non-linear model:
getCov(model) # species association matrix
= predict(model) # predict on fitted data
pred = predict(model, newdata = X) # predict on new data pred
Extract and set weights of model:
= getWeights(model) # get layer weights and sigma
weights setWeights(model, weights)
Plot the training history:
plot(model)
= sjSDM(Y, env = linear(X, ~X1+X2), spatial = DNN(XYcoords, ~0+X1:X2, lambda = 0.4))
model summary(model)
= sjSDM(Y, env = DNN(X, ~X1+X2), spatial = DNN(XYcoords, ~0+X1:X2, lambda = 0.4))
model summary(model)
sjSDM supports other responses than presence-absence data: Simulate non-presence-absence data:
= simulate_SDM(env = 3L, species = 5L, sites = 100L,link = "identical", response = "count") # samples from a Poisson distribution
com = com$env_weights
X = com$response Y
= sjSDM(Y, env = linear(X, ~.), se = TRUE, iter = 50L, family = poisson("log"))
model summary(model)
= sjSDM(log(Y+0.01), env = linear(X, ~.), se = TRUE, iter = 50L, family = gaussian("identity"))
model summary(model)