Getting started with cxr

David Garcia-Callejas and cxr team

Introduction

The cxr package provides a general interface to obtain estimates of species vital rates and interaction coefficients between species pairs from empirical data. These estimations are critical to parameterize population models describing the dynamics of interacting species. They also allow computing a series of metrics associated with modern coexistence theory that inform about the likelihood of species to coexist. These metrics are 1) niche differences that stabilize coexistence between competing species and 2) average fitness differences that drive competitive dominance and, in the absence of niche differences, determine the superior competitor.

The package also allows exploring how environmental variation modifies both the intrinsic ability of species to produce offspring and the interaction coefficients between pairs of species (including with itself). This feature opens the possibility of exploring how stabilizing niche differences and average fitness differences vary across environmental gradients, and therefore, it allows analyzing whether the likelihood of species to coexist changes across environmental conditions.

Here we demonstrate the basic functionality of the package using a published observational dataset (see Lanuza et al. (2018) and vignette 2 (Data formats) for a description of the dataset). With this example, we will specifically estimate seed production in the absence of neighbors (lambda) and the strength and sign of species interactions between species pairs (alpha matrix). These values are the basis for estimating the degree of niche overlap (i.e 1- niche differences) and average fitness differences between species pairs, which are covered in vignette 3 (Coexistence metrics).

Finally, these estimations of lambda and alpha parameters are the basis for analyzing more complex dynamics such as the stability of the dynamics of multispecies communities (see Godoy et al. 2017) and multitrophic coexistence (see Godoy et al. 2018).

Fitting a single species

First, we load the package and the associated data. The included dataset contains, for each individual, its reproductive success and the number of neighbors per species in a 7.5 cm buffer (see vignette 2).

library(cxr)
data("neigh_list")

First, we draw the values of a single focal species.

my.sp <- "HOMA" 
# get data from the list
obs_homa <- neigh_list[[my.sp]]
# no need for ID column
obs_homa <- subset(obs_homa,select = -c(obs_ID))
# For each observation, we need the individual plant fitness 
# and the number of neighbours per species (in columns).
head(obs_homa)

##   fitness BEMA CETE CHFU CHMI HOMA LEMA MEEL MESU PAIN PLCO POMA POMO PUPA SASO
## 1      12    2    0    0    0   35    0    0    0    0    0    0    0    0    0
## 2      12    0    0    0    0   34    2    0    0    0    0    0    0    0    0
## 3      12    0    0    0    0   42    1    0    0    0    0    0    0    0    0
## 4      12    0    0    0    0   42    2    0    0    0    0    0    0    0    0
## 5      12    1    0    0    0   38    0    0    0    0    0    0    0    0    0
## 6      12    1    0    0    0   52    1    0    0    0    0    0    0    0    0
##   SCLA SOAS SPRU
## 1    0    0    0
## 2    0    0    0
## 3    0    0    0
## 4    0    0    0
## 5    0    0    0
## 6    0    0    0

Next, we estimate both the reproduction success in the absence of neighbors (lambda) and the competition matrix (alpha). This is done by fitting a model that mathematically relates the reproductive success to the number of neighbors observed. In this first example, we fit the selected species with a Ricker model (‘RK’ model family, see vignette 4) and fairly standard initial values. The default optimization method (Nelder-Mead) does not allow for lower or upper bounds in model parameters, so these arguments are commented out. In ecological terms, this optimization process allows estimating the strength of both competitive and facilitative interactions, yet bounded optimization algorithms can be used to restrict the analysis to either competition or facilitation (i.e. positive or negative alpha values). We can also specify whether we want to compute standard errors numerically, by setting the argument bootstrap_samples to the number of bootstrap replications for the calculation.

#?cxr_pm_fit #check the help file for a description of the arguments
fit_homa <- cxr_pm_fit(data = obs_homa,
                       focal_column = my.sp,
                       model_family = "RK",
                       covariates = NULL,
                       optimization_method = "Nelder-Mead",
                       alpha_form = "pairwise",
                       lambda_cov_form = "none",
                       alpha_cov_form = "none",
                       initial_values = list(lambda = 1,
                                             alpha_intra = .1,
                                             alpha_inter = .1),
                       #not aplicable to this optimazation method
                       # lower_bounds = list(lambda = 0, 
                       #                     alpha_intra = 0,
                       #                     alpha_inter = 0),
                       # upper_bounds = list(lambda = 10,
                       #                     alpha_intra = 1,
                       #                     alpha_inter = 1),
                       fixed_terms = NULL,
                       # a low number of bootstrap samples
                       # for demonstration purposes, 
                       # increase it for robust results.
                       bootstrap_samples = 3)

For a quick summary of the fit, we can run a summary on the resulting object.

summary(fit_homa)

## 
## model: 'RK_pm_alpha_pairwise_lambdacov_none_alphacov_none'
## optimization method: 'Nelder-Mead'
## ----------
## focal taxa ID: HOMA
## observations: 288
## neighbours: 17
## covariates: 0
## ----------
## focal lambda: 1.400425
## alpha_intra: -0.04528702
## mean alpha_inter: NA
## mean lambda_cov: - not fit - 
## mean alpha_cov: - not fit - 
## negative log-likelihood of the fit: 544.4854
## ----------

This object is actually a list with several elements. We can thus access these elements as usual:

names(fit_homa) #list of all available elements.

##  [1] "model_name"                 "model"                     
##  [3] "data"                       "focal_ID"                  
##  [5] "optimization_method"        "initial_values"            
##  [7] "fixed_terms"                "lambda"                    
##  [9] "alpha_intra"                "alpha_inter"               
## [11] "lambda_cov"                 "alpha_cov"                 
## [13] "lambda_standard_error"      "alpha_intra_standard_error"
## [15] "alpha_inter_standard_error" "lambda_cov_standard_error" 
## [17] "alpha_cov_standard_error"   "log_likelihood"

#reproduction success in the absence of neighbors
fit_homa$lambda

##   lambda 
## 1.400425

# intraspecific interaction
fit_homa$alpha_intra

##        HOMA 
## -0.04528702

# interspecific interactions
fit_homa$alpha_inter

##          BEMA          CETE          CHFU          CHMI          LEMA 
## -0.6122006013            NA            NA            NA -0.1272862948 
##          MEEL          MESU          PAIN          PLCO          POMA 
##  0.7402077398  0.0429329620 -0.0008210413  0.0034516557 -0.5177692811 
##          POMO          PUPA          SASO          SCLA          SOAS 
## -1.4909587241  0.2775812126  0.1213981101 -0.4398662135            NA 
##          SPRU 
##  1.1584545222

Note that some interaction coefficients are set to NA because species do not cooccur but are nevertheless listed as neighbours with densities equal to zero in all focal observations.

Fitting several species at once

Most likely users will want to fit model parameters to data from two or more focal species. In order to do that with a single call, we provide the function cxr_pm_multifit, which has a very similar interface to cxr_pm_fit. Here we show how multiple species can be fit using this function. For this multispecies case, rows in the alpha element of the returning list correspond to species i and columns to species j for each \(\alpha_{ij}\) coefficient. The diagonal corresponds to intraspecific coefficients. In order to showcase other capabilities of the package, we include in this example the effect of a covariate over the fitted lambda and alpha parameters. This covariate, soil salinity, is also included as a dataset in the package (see vignette 2). We consider that the covariate has a linear effect in both the modification of lambda and alpha parameters.

my.sp <- c("BEMA","CETE","LEMA")
obs_3sp <- neigh_list[my.sp]
# discard ID column
for(i in 1:length(obs_3sp)){
  obs_3sp[[i]] <- obs_3sp[[i]][,2:length(obs_3sp[[i]])]
}
# load covariates: salinity
data("salinity_list")
salinity <- salinity_list[my.sp]
# keep only salinity column
for(i in 1:length(salinity)){
  salinity[[i]] <- as.matrix(salinity[[i]][,2:length(salinity[[i]])])
  colnames(salinity[[i]]) <- "salinity"
}

Note how the data is passed in a list with as many elements as focal species. Each element is a dataframe with observations of the corresponding focal species. Same for the covariates data, it must be a list with as many elements as focal species. Each element is a dataframe (or a matrix) with a column for each covariate (one, in this case) and the same number of observations as its associated species data.

names(obs_3sp)

## [1] "BEMA" "CETE" "LEMA"

# observation data
head(obs_3sp[[1]])

## # A tibble: 6 x 18
##   fitness  BEMA  CETE  CHFU  CHMI  HOMA  LEMA  MEEL  MESU  PAIN  PLCO  POMA
##     <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     116     1     0     0     0     0     0     0     0     0     0    34
## 2      68     0     0     0     0     0     1     0     0     0     0    47
## 3      36     0     0     0     0     0     0     0     0     0     0    38
## 4      64     0     0     0     0     0     5     0     0     0     0    21
## 5     144     2     0     0     0     0     0     0     0     0     0    55
## 6      56     1     0     0     0     0     4     0     0     0     0    17
## # … with 6 more variables: POMO <dbl>, PUPA <dbl>, SASO <dbl>, SCLA <dbl>,
## #   SOAS <dbl>, SPRU <dbl>

# number of fitness observations
nrow(obs_3sp[[1]])

## [1] 287

# salinity data
head(salinity[[1]])

##      salinity
## [1,]   0.9860
## [2,]   1.0280
## [3,]   1.0000
## [4,]   0.8830
## [5,]   0.8420
## [6,]   0.9121

# number of covariate observations
nrow(salinity[[1]])

## [1] 287

We fit the model as above, but using the cxr_pm_multifit function.

fit_3sp <- cxr_pm_multifit(data = obs_3sp,
                           focal_column = my.sp,
                           model_family = "RK",
                           # here we use a bounded method for demonstration purposes
                           optimization_method = "bobyqa", 
                           covariates = salinity,
                           alpha_form = "pairwise",
                           lambda_cov_form = "global", # effect of covariates over lambda
                           alpha_cov_form = "global", # effect of covariates over alpha
                           initial_values = list(lambda = 1,
                                                 alpha_intra = 0.1,
                                                 alpha_inter = 0.1,
                                                 lambda_cov = 0.1,
                                                 alpha_cov = 0.1),
                           lower_bounds = list(lambda = 0,
                                               alpha_intra = 0,
                                               alpha_inter = -1,
                                               lambda_cov = 0,
                                               alpha_cov = 0),
                           upper_bounds = list(lambda = 100,
                                               alpha_intra = 1,
                                               alpha_inter = 1,
                                               lambda_cov = 1,
                                               alpha_cov = 1),
                           # no standard errors
                           bootstrap_samples = 0)

We can also have a glimpse of this multispecies fit with the summary function:

summary(fit_3sp)

## model: 'RK_pm_alpha_pairwise_lambdacov_global_alphacov_global'
## optimization method: 'bobyqa'
## ----------
##     sp observations neighbours covariates   lambda lambda_cov_salinity
## 1 BEMA          287         17          1 17.07055           0.9999988
## 2 CETE           10         17          1 39.98296           1.0000000
## 3 LEMA          273         17          1 15.12572           1.0000000
##   mean_alpha_cov_salinity
## 1            0.0006556123
## 2            0.0000000000
## 3            0.1208871773
## 
## ----------
## alpha matrix:
##            BEMA        CETE CHFU CHMI        HOMA       LEMA       MEEL
## BEMA  0.0000000          NA   NA   NA -0.02050387 -0.1744143 -0.4012841
## CETE -0.4157086 0.008773895   NA   NA  0.16596698 -1.0000000         NA
## LEMA -0.3908039          NA   NA   NA -0.01723074  0.0000000 -0.2907515
##            MESU        PAIN        PLCO        POMA       POMO       PUPA
## BEMA -0.2303013  0.04660636 -0.05456346 -0.02128597 -0.4045222 -0.9992533
## CETE         NA -0.18386228  1.00000000          NA         NA         NA
## LEMA -0.2544182 -0.34141296 -0.16619114 -0.00553322 -0.2613790         NA
##             SASO       SCLA       SOAS        SPRU
## BEMA -0.10719613 -0.5205668 -0.3812140  0.00367967
## CETE -0.12539673 -0.4954568         NA -0.44113740
## LEMA -0.02118463 -0.3337160 -0.6116377 -0.01094160

The numerical estimation of parameters depends on the model with which to estimate fitness values, the optimization method, and the underlying data. In our example dataset, some species are better represented than others, and the function will raise warnings if the estimation can be improved or, for example, if any fitted parameter is equal to the lower or upper bounds provided. The cxr_pm_multifit function will behave conservatively and return NULL if parameter fitting fails for any of the focal species passed as arguments, printing an informative message about which species failed to fit. In such cases, users may either fit each species separately or call cxr_pm_multifit without the problematic species.

Importantly, bounded methods can be very sensitive to the initial values and bounds, so, as in any numerical optimization problem, you should double-check the values obtained, at least by computing standard errors on your parameters with an adequate number of boostrap samples, and preferably by performing sensitivity analyses on the different parameters (in this and other vignettes, we have not included any of these checks, as our examples are merely for demonstration purposes). Aside from these recommendations, in the cxr objects returned from our functions, the negative log-likelihood of the fit is also included, which may be useful in helping users choose a certain optimization algorithm for a particular dataset. Remember that the convention is to present negative log-likelihood, and more negative values are better. In general, it is recommended to test different optimization algorithms, as they may produce fairly different results (Mullen 2014). In this example, the method used (“bobyqa”, a well-established bounded optimization algorithm) returns the following negative log-likelihood values:

fit_3sp$log_likelihood

##      BEMA      CETE      LEMA 
## 314.09376  12.83698 309.06174

Including environmental variability

In the above example for fitting multiple species at once, we have already offered a glimpse on how to include the effect of environmental covariates over lambda and alpha values. The relevant arguments in cxr_pm_fit and cxr_pm_multifit are ‘lambda_cov_form’ and ‘alpha_cov_form’. If these are set to ‘none’, no effect of covariates is considered. Otherwise, there are a couple options to model this effect (all the options consider linear effects, for now). First, the effect of covariates over lambda can be ‘global’, meaning that each covariate has a global parameter affecting lambda. This is formulated as

\(\lambda (1 + \sum_{k=1}^{s} \theta_k c_k)\)

where \(s\) is the number of environmental covariates, \(c_k\) is the observed value of the i-th covariate and \(\theta_k\) is the ‘lambda_cov’ parameter of the cxr functions.

The effect over alpha values can likewise be ‘global’, so that, for focal species \(i\), each covariate affects the alpha values \(\alpha_{ij}\) equally through a global parameter:

\(\alpha_* + \sum_{k=1}^{s} \psi_k c_k\)

where \(*\) represents the set of all pairwise interaction for a given focal species, and \(\psi_k\) is the global parameter relating covariate \(k\) to the alpha values. The last possibility included in cxr is for the effect of covariates over alpha values to be interaction-specific, which is coded by specifying ‘alpha_cov_form’ as ‘pairwise’:

\(\alpha_{ij} + \sum_{k=1}^{s} \psi_{ijk} c_k\)

In this case, each covariate \(c_k\) will have a specific parameter \(\psi_{ijk}\) for its effect over each interaction \(\alpha_{ij}\).

We can retrieve the fitted values for ‘lambda_cov’ (\(\theta\)) and ‘alpha_cov’ (\(\psi\)) simply from the output of the function:

fit_3sp$lambda_cov

##       salinity
## BEMA 0.9999988
## CETE 1.0000000
## LEMA 1.0000000

fit_3sp$alpha_cov

## $salinity
##              BEMA         CETE         CHFU         CHMI         HOMA
## BEMA 0.0006556123 0.0006556123 0.0006556123 0.0006556123 0.0006556123
## CETE 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
## LEMA 0.1208871773 0.1208871773 0.1208871773 0.1208871773 0.1208871773
##              LEMA         MEEL         MESU         PAIN         PLCO
## BEMA 0.0006556123 0.0006556123 0.0006556123 0.0006556123 0.0006556123
## CETE 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
## LEMA 0.1208871773 0.1208871773 0.1208871773 0.1208871773 0.1208871773
##              POMA         POMO         PUPA         SASO         SCLA
## BEMA 0.0006556123 0.0006556123 0.0006556123 0.0006556123 0.0006556123
## CETE 0.0000000000 0.0000000000 0.0000000000 0.0000000000 0.0000000000
## LEMA 0.1208871773 0.1208871773 0.1208871773 0.1208871773 0.1208871773
##              SOAS         SPRU
## BEMA 0.0006556123 0.0006556123
## CETE 0.0000000000 0.0000000000
## LEMA 0.1208871773 0.1208871773

Parameter boundaries

cxr includes the possibility of providing boundaries for your parameters, so that the fitted values fall inside a certain interval. This is done via the arguments lower_bounds and upper_bounds of the fitting functions. These arguments accept a list whose elements must be named and correspond to one or more of the function parameters, i.e. lambda, alpha_intra, alpha_inter, lambda_cov, and alpha_cov. If these boundaries are of length 1, as in this example:

lower_bounds = list(lambda = 0,
                    alpha_intra = 0,
                    alpha_inter = -1,
                    lambda_cov = 0,
                    alpha_cov = 0)
upper_bounds = list(lambda = 100,
                    alpha_intra = 1,
                    alpha_inter = 1,
                    lambda_cov = 1,
                    alpha_cov = 1)

the same boundaries will be used for all values in the appropriate set, e.g. each interspecific alpha term (alpha_inter) will have the same boundaries. Otherwise, if you are interested in fitting parameters for more than one covariate, you may want to specify varying boundaries for lambda_cov and alpha_cov depending on the different covariates. In that case, the functions accept, for these elements, vectors with as many values as covariates. For example, if you have two covariates for which you want to calculate associated lambda_cov and alpha_cov values, but these covariates have very different magnitudes, you may provide two lambda_cov and alpha_cov boundaries.

# fit three species, as in the previous example
data <- neigh_list[1:3]

# keep only fitness and neighbours columns
for(i in 1:length(data)){
  data[[i]] <- data[[i]][,2:length(data[[i]])]
}
focal_column <- names(data)

# covariates
# in this example, we use salinity
# and two more covariates, randomly generated

data("salinity_list")

# observations for the first three species
cov_list <- salinity_list[1:3]

for(i in 1:length(cov_list)){
  # keep only salinity column
  cov_list[[i]] <- as.matrix(cov_list[[i]][,2:length(cov_list[[i]])])
  
  # add two random covariates
  cov_list[[i]] <- cbind(cov_list[[i]],
                         runif(nrow(cov_list[[i]]),1,10),
                         runif(nrow(cov_list[[i]]),10,100))
  colnames(cov_list[[i]]) <- c("salinity","cov2","cov3")
}

# this is how each element of the covariates list looks like
head(cov_list[[1]])

##      salinity     cov2     cov3
## [1,]   0.9860 1.456240 87.55923
## [2,]   1.0280 2.786870 94.29064
## [3,]   1.0000 6.676219 99.25791
## [4,]   0.8830 8.415803 20.75340
## [5,]   0.8420 8.099523 73.16040
## [6,]   0.9121 9.670322 51.47348

# function parameters
model_family <- "BH"
covariates <- cov_list
# bobyqa is generally more robust than other bounded methods
optimization_method <- "bobyqa"
alpha_form <- "pairwise"
lambda_cov_form <- "global"
alpha_cov_form <- "pairwise"

# note how lambda_cov and alpha_cov
# have different initial values for each covariate effect
# the commented assignations are also possible, 
# giving equal initial values to all parameters
initial_values = list(lambda = 1,
                      alpha_intra = 0.1,
                      alpha_inter = 0.1,
                      lambda_cov = c(0.1,0.2,0.1),
                      alpha_cov = c(0.1,0.2,0.1))
# lambda_cov = c(0.1),
# alpha_cov = c(0.1))

# same with boundaries
lower_bounds = list(lambda = 0,
                    alpha_intra = 0,
                    alpha_inter = -1,
                    lambda_cov = c(-1,0,-1),
                    alpha_cov = c(-1,0,-1))
# lambda_cov = c(-1),
# alpha_cov = c(-1))

upper_bounds = list(lambda = 100,
                    alpha_intra = 1,
                    alpha_inter = 1,
                    lambda_cov = c(1,2,1),
                    alpha_cov = c(1,2,1))
# lambda_cov = c(1),
# alpha_cov = c(1))

fixed_terms <- NULL
bootstrap_samples <- 3

# this fit is fairly complex, it may take a while, 
# it also raises warnings, suggesting that either the data,
# model, or initial values/boundaries can be improved.
# This is consistent with having observational data
# and, furthermore, random covariates

fit_multi_cov <- cxr_pm_multifit(data = data,
                                 focal_column = focal_column,
                                 model_family = model_family,
                                 covariates = covariates,
                                 optimization_method = optimization_method,
                                 alpha_form = alpha_form,
                                 lambda_cov_form = lambda_cov_form,
                                 alpha_cov_form = alpha_cov_form,
                                 initial_values = initial_values,
                                 lower_bounds = lower_bounds,
                                 upper_bounds = upper_bounds,
                                 fixed_terms = fixed_terms,
                                 bootstrap_samples = bootstrap_samples)

Keeping parameters fixed

There is the option to provide fixed values for model parameters, using the fixed_terms argument in the functions. As an example, if you want to keep the lambda parameter fixed, for a single species fit, you would modify the function arguments accordingly:

fixed_terms <- list(lambda = 1)

# now lambda does not appear in 'initial_values'
initial_values <- list(alpha_intra = 0,
                       alpha_inter = 0,
                       lambda_cov = 0,
                       alpha_cov = 0)
# lower and upper bounds should, likewise, 
# not contain lambda

This is extended to the cxr_pm_multifit function, by having fixed_terms be a list with as many elements as focal species. For example, having three focal species, where we want to keep lambda fixed:

fixed_terms <- list(list(lambda = 1), # focal sp 1
                    list(lambda = 1.2), # focal sp 2
                    list(lambda = 1.3)) # focal sp 3

In this release of cxr, you must specify the same initial values for every focal species when using the cxr_pm_multifit option. This means, in practice, that if you want to fit species with different sets of fixed parameters, (e.g. only lambda for one species, but lambda and alpha_intra for another), or with different initial values, you should fit them separately, using cxr_pm_fit.

References

Godoy, O., Stouffer, D. B., Kraft, N. J., & Levine, J. M. (2017). Intransitivity is infrequent and fails to promote annual plant coexistence without pairwise niche differences. Ecology, 98(5), 1193-1200.

Lanuza, J. B., Bartomeus, I., & Godoy, O. (2018). Opposing effects of floral visitors and soil conditions on the determinants of competitive outcomes maintain species diversity in heterogeneous landscapes. Ecology letters, 21(6), 865-874.

Godoy, O., Bartomeus, I., Rohr, R. P., & Saavedra, S. (2018). Towards the integration of niche and network theories. Trends in ecology & evolution, 33(4), 287-300.

Mullen, K. M. (2014). Continuous global optimization in R. Journal of Statistical Software, 60(6), 1-45.