Introduction
The cxr
package provides a general interface to obtain estimates of species vital rates and interaction coefficients between species pairs from empirical data. These estimations are critical to parameterize population models describing the dynamics of interacting species. They also allow computing a series of metrics associated with modern coexistence theory that inform about the likelihood of species to coexist. These metrics are 1) niche differences that stabilize coexistence between competing species and 2) average fitness differences that drive competitive dominance and, in the absence of niche differences, determine the superior competitor.
The package also allows exploring how environmental variation modifies both the intrinsic ability of species to produce offspring and the interaction coefficients between pairs of species (including with itself). This feature opens the possibility of exploring how stabilizing niche differences and average fitness differences vary across environmental gradients, and therefore, it allows analyzing whether the likelihood of species to coexist changes across environmental conditions.
Here we demonstrate the basic functionality of the package using a published observational dataset (see Lanuza et al. (2018) and vignette 2 (Data formats) for a description of the dataset). With this example, we will specifically estimate seed production in the absence of neighbors (lambda) and the strength and sign of species interactions between species pairs (alpha matrix). These values are the basis for estimating the degree of niche overlap (i.e 1- niche differences) and average fitness differences between species pairs, which are covered in vignette 3 (Coexistence metrics).
Finally, these estimations of lambda and alpha parameters are the basis for analyzing more complex dynamics such as the stability of the dynamics of multispecies communities (see Godoy et al. 2017) and multitrophic coexistence (see Godoy et al. 2018).
Fitting a single species
First, we load the package and the associated data. The included dataset contains, for each individual, its reproductive success and the number of neighbors per species in a 7.5 cm buffer (see vignette 2).
library(cxr)
data("neigh_list")
First, we draw the values of a single focal species.
<- "HOMA"
my.sp # get data from the list
<- neigh_list[[my.sp]]
obs_homa # no need for ID column
<- subset(obs_homa,select = -c(obs_ID))
obs_homa # For each observation, we need the individual plant fitness
# and the number of neighbours per species (in columns).
head(obs_homa)
## fitness BEMA CETE CHFU CHMI HOMA LEMA MEEL MESU PAIN PLCO POMA POMO PUPA SASO
## 1 12 2 0 0 0 35 0 0 0 0 0 0 0 0 0
## 2 12 0 0 0 0 34 2 0 0 0 0 0 0 0 0
## 3 12 0 0 0 0 42 1 0 0 0 0 0 0 0 0
## 4 12 0 0 0 0 42 2 0 0 0 0 0 0 0 0
## 5 12 1 0 0 0 38 0 0 0 0 0 0 0 0 0
## 6 12 1 0 0 0 52 1 0 0 0 0 0 0 0 0
## SCLA SOAS SPRU
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
Next, we estimate both the reproduction success in the absence of neighbors (lambda) and the competition matrix (alpha). This is done by fitting a model that mathematically relates the reproductive success to the number of neighbors observed. In this first example, we fit the selected species with a Ricker model (‘RK’ model family, see vignette 4) and fairly standard initial values. The default optimization method (Nelder-Mead
) does not allow for lower or upper bounds in model parameters, so these arguments are commented out. In ecological terms, this optimization process allows estimating the strength of both competitive and facilitative interactions, yet bounded optimization algorithms can be used to restrict the analysis to either competition or facilitation (i.e. positive or negative alpha values). We can also specify whether we want to compute standard errors numerically, by setting the argument bootstrap_samples
to the number of bootstrap replications for the calculation.
#?cxr_pm_fit #check the help file for a description of the arguments
<- cxr_pm_fit(data = obs_homa,
fit_homa focal_column = my.sp,
model_family = "RK",
covariates = NULL,
optimization_method = "Nelder-Mead",
alpha_form = "pairwise",
lambda_cov_form = "none",
alpha_cov_form = "none",
initial_values = list(lambda = 1,
alpha_intra = .1,
alpha_inter = .1),
#not aplicable to this optimazation method
# lower_bounds = list(lambda = 0,
# alpha_intra = 0,
# alpha_inter = 0),
# upper_bounds = list(lambda = 10,
# alpha_intra = 1,
# alpha_inter = 1),
fixed_terms = NULL,
# a low number of bootstrap samples
# for demonstration purposes,
# increase it for robust results.
bootstrap_samples = 3)
For a quick summary of the fit, we can run a summary on the resulting object.
summary(fit_homa)
##
## model: 'RK_pm_alpha_pairwise_lambdacov_none_alphacov_none'
## optimization method: 'Nelder-Mead'
## ----------
## focal taxa ID: HOMA
## observations: 288
## neighbours: 17
## covariates: 0
## ----------
## focal lambda: 1.400425
## alpha_intra: -0.04528702
## mean alpha_inter: NA
## mean lambda_cov: - not fit -
## mean alpha_cov: - not fit -
## negative log-likelihood of the fit: 544.4854
## ----------
This object is actually a list with several elements. We can thus access these elements as usual:
names(fit_homa) #list of all available elements.
## [1] "model_name" "model"
## [3] "data" "focal_ID"
## [5] "optimization_method" "initial_values"
## [7] "fixed_terms" "lambda"
## [9] "alpha_intra" "alpha_inter"
## [11] "lambda_cov" "alpha_cov"
## [13] "lambda_standard_error" "alpha_intra_standard_error"
## [15] "alpha_inter_standard_error" "lambda_cov_standard_error"
## [17] "alpha_cov_standard_error" "log_likelihood"
#reproduction success in the absence of neighbors
$lambda fit_homa
## lambda
## 1.400425
# intraspecific interaction
$alpha_intra fit_homa
## HOMA
## -0.04528702
# interspecific interactions
$alpha_inter fit_homa
## BEMA CETE CHFU CHMI LEMA
## -0.6122006013 NA NA NA -0.1272862948
## MEEL MESU PAIN PLCO POMA
## 0.7402077398 0.0429329620 -0.0008210413 0.0034516557 -0.5177692811
## POMO PUPA SASO SCLA SOAS
## -1.4909587241 0.2775812126 0.1213981101 -0.4398662135 NA
## SPRU
## 1.1584545222
Note that some interaction coefficients are set to NA because species do not cooccur but are nevertheless listed as neighbours with densities equal to zero in all focal observations.
Fitting several species at once
Most likely users will want to fit model parameters to data from two or more focal species. In order to do that with a single call, we provide the function cxr_pm_multifit
, which has a very similar interface to cxr_pm_fit
. Here we show how multiple species can be fit using this function. For this multispecies case, rows in the alpha element of the returning list correspond to species i and columns to species j for each \(\alpha_{ij}\) coefficient. The diagonal corresponds to intraspecific coefficients. In order to showcase other capabilities of the package, we include in this example the effect of a covariate over the fitted lambda and alpha parameters. This covariate, soil salinity, is also included as a dataset in the package (see vignette 2). We consider that the covariate has a linear effect in both the modification of lambda and alpha parameters.
<- c("BEMA","CETE","LEMA")
my.sp <- neigh_list[my.sp]
obs_3sp # discard ID column
for(i in 1:length(obs_3sp)){
<- obs_3sp[[i]][,2:length(obs_3sp[[i]])]
obs_3sp[[i]]
}# load covariates: salinity
data("salinity_list")
<- salinity_list[my.sp]
salinity # keep only salinity column
for(i in 1:length(salinity)){
<- as.matrix(salinity[[i]][,2:length(salinity[[i]])])
salinity[[i]] colnames(salinity[[i]]) <- "salinity"
}
Note how the data is passed in a list with as many elements as focal species. Each element is a dataframe with observations of the corresponding focal species. Same for the covariates data, it must be a list with as many elements as focal species. Each element is a dataframe (or a matrix) with a column for each covariate (one, in this case) and the same number of observations as its associated species data.
names(obs_3sp)
## [1] "BEMA" "CETE" "LEMA"
# observation data
head(obs_3sp[[1]])
## # A tibble: 6 x 18
## fitness BEMA CETE CHFU CHMI HOMA LEMA MEEL MESU PAIN PLCO POMA
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 116 1 0 0 0 0 0 0 0 0 0 34
## 2 68 0 0 0 0 0 1 0 0 0 0 47
## 3 36 0 0 0 0 0 0 0 0 0 0 38
## 4 64 0 0 0 0 0 5 0 0 0 0 21
## 5 144 2 0 0 0 0 0 0 0 0 0 55
## 6 56 1 0 0 0 0 4 0 0 0 0 17
## # … with 6 more variables: POMO <dbl>, PUPA <dbl>, SASO <dbl>, SCLA <dbl>,
## # SOAS <dbl>, SPRU <dbl>
# number of fitness observations
nrow(obs_3sp[[1]])
## [1] 287
# salinity data
head(salinity[[1]])
## salinity
## [1,] 0.9860
## [2,] 1.0280
## [3,] 1.0000
## [4,] 0.8830
## [5,] 0.8420
## [6,] 0.9121
# number of covariate observations
nrow(salinity[[1]])
## [1] 287
We fit the model as above, but using the cxr_pm_multifit
function.
<- cxr_pm_multifit(data = obs_3sp,
fit_3sp focal_column = my.sp,
model_family = "RK",
# here we use a bounded method for demonstration purposes
optimization_method = "bobyqa",
covariates = salinity,
alpha_form = "pairwise",
lambda_cov_form = "global", # effect of covariates over lambda
alpha_cov_form = "global", # effect of covariates over alpha
initial_values = list(lambda = 1,
alpha_intra = 0.1,
alpha_inter = 0.1,
lambda_cov = 0.1,
alpha_cov = 0.1),
lower_bounds = list(lambda = 0,
alpha_intra = 0,
alpha_inter = -1,
lambda_cov = 0,
alpha_cov = 0),
upper_bounds = list(lambda = 100,
alpha_intra = 1,
alpha_inter = 1,
lambda_cov = 1,
alpha_cov = 1),
# no standard errors
bootstrap_samples = 0)
We can also have a glimpse of this multispecies fit with the summary function:
summary(fit_3sp)
## model: 'RK_pm_alpha_pairwise_lambdacov_global_alphacov_global'
## optimization method: 'bobyqa'
## ----------
## sp observations neighbours covariates lambda lambda_cov_salinity
## 1 BEMA 287 17 1 17.07055 0.9999988
## 2 CETE 10 17 1 39.98296 1.0000000
## 3 LEMA 273 17 1 15.12572 1.0000000
## mean_alpha_cov_salinity
## 1 0.0006556123
## 2 0.0000000000
## 3 0.1208871773
##
## ----------
## alpha matrix:
## BEMA CETE CHFU CHMI HOMA LEMA MEEL
## BEMA 0.0000000 NA NA NA -0.02050387 -0.1744143 -0.4012841
## CETE -0.4157086 0.008773895 NA NA 0.16596698 -1.0000000 NA
## LEMA -0.3908039 NA NA NA -0.01723074 0.0000000 -0.2907515
## MESU PAIN PLCO POMA POMO PUPA
## BEMA -0.2303013 0.04660636 -0.05456346 -0.02128597 -0.4045222 -0.9992533
## CETE NA -0.18386228 1.00000000 NA NA NA
## LEMA -0.2544182 -0.34141296 -0.16619114 -0.00553322 -0.2613790 NA
## SASO SCLA SOAS SPRU
## BEMA -0.10719613 -0.5205668 -0.3812140 0.00367967
## CETE -0.12539673 -0.4954568 NA -0.44113740
## LEMA -0.02118463 -0.3337160 -0.6116377 -0.01094160
The numerical estimation of parameters depends on the model with which to estimate fitness values, the optimization method, and the underlying data. In our example dataset, some species are better represented than others, and the function will raise warnings if the estimation can be improved or, for example, if any fitted parameter is equal to the lower or upper bounds provided. The cxr_pm_multifit
function will behave conservatively and return NULL
if parameter fitting fails for any of the focal species passed as arguments, printing an informative message about which species failed to fit. In such cases, users may either fit each species separately or call cxr_pm_multifit
without the problematic species.
Importantly, bounded methods can be very sensitive to the initial values and bounds, so, as in any numerical optimization problem, you should double-check the values obtained, at least by computing standard errors on your parameters with an adequate number of boostrap samples, and preferably by performing sensitivity analyses on the different parameters (in this and other vignettes, we have not included any of these checks, as our examples are merely for demonstration purposes). Aside from these recommendations, in the cxr
objects returned from our functions, the negative log-likelihood of the fit is also included, which may be useful in helping users choose a certain optimization algorithm for a particular dataset. Remember that the convention is to present negative log-likelihood, and more negative values are better. In general, it is recommended to test different optimization algorithms, as they may produce fairly different results (Mullen 2014). In this example, the method used (“bobyqa”, a well-established bounded optimization algorithm) returns the following negative log-likelihood values:
$log_likelihood fit_3sp
## BEMA CETE LEMA
## 314.09376 12.83698 309.06174
Including environmental variability
In the above example for fitting multiple species at once, we have already offered a glimpse on how to include the effect of environmental covariates over lambda and alpha values. The relevant arguments in cxr_pm_fit
and cxr_pm_multifit
are ‘lambda_cov_form’ and ‘alpha_cov_form’. If these are set to ‘none’, no effect of covariates is considered. Otherwise, there are a couple options to model this effect (all the options consider linear effects, for now). First, the effect of covariates over lambda can be ‘global’, meaning that each covariate has a global parameter affecting lambda. This is formulated as
\(\lambda (1 + \sum_{k=1}^{s} \theta_k c_k)\)
where \(s\) is the number of environmental covariates, \(c_k\) is the observed value of the i-th covariate and \(\theta_k\) is the ‘lambda_cov’ parameter of the cxr
functions.
The effect over alpha values can likewise be ‘global’, so that, for focal species \(i\), each covariate affects the alpha values \(\alpha_{ij}\) equally through a global parameter:
\(\alpha_* + \sum_{k=1}^{s} \psi_k c_k\)
where \(*\) represents the set of all pairwise interaction for a given focal species, and \(\psi_k\) is the global parameter relating covariate \(k\) to the alpha values. The last possibility included in cxr
is for the effect of covariates over alpha values to be interaction-specific, which is coded by specifying ‘alpha_cov_form’ as ‘pairwise’:
\(\alpha_{ij} + \sum_{k=1}^{s} \psi_{ijk} c_k\)
In this case, each covariate \(c_k\) will have a specific parameter \(\psi_{ijk}\) for its effect over each interaction \(\alpha_{ij}\).
We can retrieve the fitted values for ‘lambda_cov’ (\(\theta\)) and ‘alpha_cov’ (\(\psi\)) simply from the output of the function:
$lambda_cov fit_3sp
## salinity
## BEMA 0.9999988
## CETE 1.0000000
## LEMA 1.0000000
$alpha_cov fit_3sp
## $salinity
## BEMA CETE CHFU CHMI HOMA
## BEMA 0.0006556123 0.0006556123 0.0006556123 0.0006556123 0.0006556123
## CETE 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
## LEMA 0.1208871773 0.1208871773 0.1208871773 0.1208871773 0.1208871773
## LEMA MEEL MESU PAIN PLCO
## BEMA 0.0006556123 0.0006556123 0.0006556123 0.0006556123 0.0006556123
## CETE 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
## LEMA 0.1208871773 0.1208871773 0.1208871773 0.1208871773 0.1208871773
## POMA POMO PUPA SASO SCLA
## BEMA 0.0006556123 0.0006556123 0.0006556123 0.0006556123 0.0006556123
## CETE 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
## LEMA 0.1208871773 0.1208871773 0.1208871773 0.1208871773 0.1208871773
## SOAS SPRU
## BEMA 0.0006556123 0.0006556123
## CETE 0.0000000000 0.0000000000
## LEMA 0.1208871773 0.1208871773
Parameter boundaries
cxr
includes the possibility of providing boundaries for your parameters, so that the fitted values fall inside a certain interval. This is done via the arguments lower_bounds
and upper_bounds
of the fitting functions. These arguments accept a list whose elements must be named and correspond to one or more of the function parameters, i.e. lambda
, alpha_intra
, alpha_inter
, lambda_cov
, and alpha_cov
. If these boundaries are of length 1, as in this example:
= list(lambda = 0,
lower_bounds alpha_intra = 0,
alpha_inter = -1,
lambda_cov = 0,
alpha_cov = 0)
= list(lambda = 100,
upper_bounds alpha_intra = 1,
alpha_inter = 1,
lambda_cov = 1,
alpha_cov = 1)
the same boundaries will be used for all values in the appropriate set, e.g. each interspecific alpha term (alpha_inter
) will have the same boundaries. Otherwise, if you are interested in fitting parameters for more than one covariate, you may want to specify varying boundaries for lambda_cov
and alpha_cov
depending on the different covariates. In that case, the functions accept, for these elements, vectors with as many values as covariates. For example, if you have two covariates for which you want to calculate associated lambda_cov
and alpha_cov
values, but these covariates have very different magnitudes, you may provide two lambda_cov
and alpha_cov
boundaries.
# fit three species, as in the previous example
<- neigh_list[1:3]
data
# keep only fitness and neighbours columns
for(i in 1:length(data)){
<- data[[i]][,2:length(data[[i]])]
data[[i]]
}<- names(data)
focal_column
# covariates
# in this example, we use salinity
# and two more covariates, randomly generated
data("salinity_list")
# observations for the first three species
<- salinity_list[1:3]
cov_list
for(i in 1:length(cov_list)){
# keep only salinity column
<- as.matrix(cov_list[[i]][,2:length(cov_list[[i]])])
cov_list[[i]]
# add two random covariates
<- cbind(cov_list[[i]],
cov_list[[i]] runif(nrow(cov_list[[i]]),1,10),
runif(nrow(cov_list[[i]]),10,100))
colnames(cov_list[[i]]) <- c("salinity","cov2","cov3")
}
# this is how each element of the covariates list looks like
head(cov_list[[1]])
## salinity cov2 cov3
## [1,] 0.9860 1.456240 87.55923
## [2,] 1.0280 2.786870 94.29064
## [3,] 1.0000 6.676219 99.25791
## [4,] 0.8830 8.415803 20.75340
## [5,] 0.8420 8.099523 73.16040
## [6,] 0.9121 9.670322 51.47348
# function parameters
<- "BH"
model_family <- cov_list
covariates # bobyqa is generally more robust than other bounded methods
<- "bobyqa"
optimization_method <- "pairwise"
alpha_form <- "global"
lambda_cov_form <- "pairwise"
alpha_cov_form
# note how lambda_cov and alpha_cov
# have different initial values for each covariate effect
# the commented assignations are also possible,
# giving equal initial values to all parameters
= list(lambda = 1,
initial_values alpha_intra = 0.1,
alpha_inter = 0.1,
lambda_cov = c(0.1,0.2,0.1),
alpha_cov = c(0.1,0.2,0.1))
# lambda_cov = c(0.1),
# alpha_cov = c(0.1))
# same with boundaries
= list(lambda = 0,
lower_bounds alpha_intra = 0,
alpha_inter = -1,
lambda_cov = c(-1,0,-1),
alpha_cov = c(-1,0,-1))
# lambda_cov = c(-1),
# alpha_cov = c(-1))
= list(lambda = 100,
upper_bounds alpha_intra = 1,
alpha_inter = 1,
lambda_cov = c(1,2,1),
alpha_cov = c(1,2,1))
# lambda_cov = c(1),
# alpha_cov = c(1))
<- NULL
fixed_terms <- 3 bootstrap_samples
# this fit is fairly complex, it may take a while,
# it also raises warnings, suggesting that either the data,
# model, or initial values/boundaries can be improved.
# This is consistent with having observational data
# and, furthermore, random covariates
<- cxr_pm_multifit(data = data,
fit_multi_cov focal_column = focal_column,
model_family = model_family,
covariates = covariates,
optimization_method = optimization_method,
alpha_form = alpha_form,
lambda_cov_form = lambda_cov_form,
alpha_cov_form = alpha_cov_form,
initial_values = initial_values,
lower_bounds = lower_bounds,
upper_bounds = upper_bounds,
fixed_terms = fixed_terms,
bootstrap_samples = bootstrap_samples)
Keeping parameters fixed
There is the option to provide fixed values for model parameters, using the fixed_terms
argument in the functions. As an example, if you want to keep the lambda
parameter fixed, for a single species fit, you would modify the function arguments accordingly:
<- list(lambda = 1)
fixed_terms
# now lambda does not appear in 'initial_values'
<- list(alpha_intra = 0,
initial_values alpha_inter = 0,
lambda_cov = 0,
alpha_cov = 0)
# lower and upper bounds should, likewise,
# not contain lambda
This is extended to the cxr_pm_multifit
function, by having fixed_terms
be a list with as many elements as focal species. For example, having three focal species, where we want to keep lambda
fixed:
<- list(list(lambda = 1), # focal sp 1
fixed_terms list(lambda = 1.2), # focal sp 2
list(lambda = 1.3)) # focal sp 3
In this release of cxr
, you must specify the same initial values for every focal species when using the cxr_pm_multifit
option. This means, in practice, that if you want to fit species with different sets of fixed parameters, (e.g. only lambda
for one species, but lambda
and alpha_intra
for another), or with different initial values, you should fit them separately, using cxr_pm_fit
.
References
Godoy, O., Stouffer, D. B., Kraft, N. J., & Levine, J. M. (2017). Intransitivity is infrequent and fails to promote annual plant coexistence without pairwise niche differences. Ecology, 98(5), 1193-1200.
Lanuza, J. B., Bartomeus, I., & Godoy, O. (2018). Opposing effects of floral visitors and soil conditions on the determinants of competitive outcomes maintain species diversity in heterogeneous landscapes. Ecology letters, 21(6), 865-874.
Godoy, O., Bartomeus, I., Rohr, R. P., & Saavedra, S. (2018). Towards the integration of niche and network theories. Trends in ecology & evolution, 33(4), 287-300.
Mullen, K. M. (2014). Continuous global optimization in R. Journal of Statistical Software, 60(6), 1-45.