As a first example of the functions in this package we will fit a model to the pied flycatcher data. We will start by fitting a model with fixed effects on the mean and dispersion components. Specifically, we will model the load returned on each trip by the adult flycatchers a independent normal random variables with mean given by a linear combination of the logarithm of the inter-visit interval (IVI), the brood size manipulation (broodsize), and the adult’s sex, and standard deviation given by a linear combination of broodsize and sex, on the log scale. We will then incorporate individual random effects into this model.
First you have to load the package:
## Load package
library(dalmatian)
The raw data for this example is provided in the package and can be accessed with:
## Load pied flycatcher data
data(pied_flycatchers_1)
This data contains records on 5795 different food deliveries recorded during the pied flycatcher experiment. The response variable, load
, records the load rounded to the nearest .1 gram. We want to model the logarithm of the load, and we also want to account for the rounding. To do this we will create two new variables in the data frame, lower
and upper
, which bound the logarithm of the true load. Not that we add .001 when the observed value is zero to avoid problems with the logarithm.
## Create variables bounding the true load
$lower=ifelse(pfdata$load==0,log(.001),log(pfdata$load-.049)) pfdata
## Warning in log(pfdata$load - 0.049): NaNs produced
$upper=log(pfdata$load+.05) pfdata
Our initial model for the logarithm of the load returned on the \(j\)th trip by adult \(i\) will be \(N(\mu_{ij},\sigma_{ij}^2)\) where: \[\mu=\beta_0 + \beta_1 \mathrm{log(IVI)}_{ij} + \beta_2 \mathrm{broodsize}_i + \beta_3 \mathrm{sex}_i\] and \[\log(\sigma_{ij}^2)=\psi_0 + \psi_1 \mathrm{broodsize}_i + \psi_2\mathrm{sex}_i.\] In the framework of generalized linear models, the variance, \(\sigma^2_i\), is equal to the dispersion parameter, which we denote generically with \(\phi\). Hence, the model can then be written as: \(Y_{ij} \sim N(\mu_{ij},\phi_{ij})\) where: \[\mu=\beta_0 + \beta_1 \mathrm{log(IVI)}_{ij} + \beta_2 \mathrm{broodsize}_i + \beta_3 \mathrm{sex}_i\] and \[\log(\phi_{ij})=\psi_0 + \psi_1 \mathrm{broodsize}_i + \psi_2\mathrm{sex}_i.\]
The first step is to define the models for the mean and dispersion components. These models are simply lists with (up to) two named components, fixed
and random
, which provide the details on the fixed and random effects. Both lists must contain a variable name
specifying the basename for the coefficients and a variable formula
specifying the effects. The fixed
list should also contain an object called priors
which specifies the priors for coefficients in the fixed effects portion of the model. Random effects are currently assumed to be normal with mean zero and unknown variances which are assigned the half \(t\)-distribution with ? degrees of freedom and scale ?. The optional parameter link
can also be used to specify a link function for either component.
The components of the model for the pied flycatcher data specified above would be generated with:
## Mean model
<- list(fixed=list(name="alpha",
mymean formula=~ log(IVI) + broodsize + sex,
priors=list(c("dnorm",0,.001))))
## Dispersion model
<- list(fixed=list(name="psi",
mydisp formula=~broodsize + sex,
priors=list(c("dnorm",0,.001))),
link="log")
These two objects will now be used to generate the JAGS code, data, and initial values for running the model.
The primary function in the package dalmatian
automates the creation of the JAGS
code, data, and initial values and then passes these to JAGS
via the functions in the rjags
package. The dalmatian
function takes requires several arguments including the data frame for the analysis, the model objects created above, and the name of the JAGS
model file. It also requires two lists containing named arguments for the functions jags.model
and coda.samples
from the rjags
package. Descriptions of the arguments for these two functions are available in their own help pages. Any arguments that are not specified in these lists will take the default values given by rjags
. The two exceptions are the file
argument of jags.model
and the n.iter
argument of coda.samples
which do not have default values and must be specified. The number of parallel chains will be taken from the jags.model
list. If this value is not specified then three chains will be run in order that convergence diagnostics can be computed. Note that the chains have been run for a relatively small number of iterations and heavily subsampled to satisfy the size requirements for packages on . Neither of these is recommended in practice.
## Set working directory
## By default uses a system temp directory. You probably want to change this.
<- tempdir()
workingDir
## Define list of arguments for jags.model()
<- list(file=file.path(workingDir,"pied_flycatcher_1_jags.R"),n.adapt=1000)
jm.args
## Define list of arguments for coda.samples()
<- list(n.iter=5000,thin=20)
cs.args
## Run the model using dalmatian
<- dalmatian(df=pfdata,
pfmcmc1 mean.model=mymean,
dispersion.model=mydisp,
jags.model.args=jm.args,
coda.samples.args=cs.args,
rounding=TRUE,
lower="lower",
upper="upper",
residuals = FALSE,
debug=FALSE)
}
The function dalmatian
returns list containing three objects: 1. jags.model.args
The full list of arguments passed to jags.model
. This includes the variables in the input list along with the name of the model file, the formatted data, and the initial values. 2. coda.samples.args
The full list of arguments passed to coda.samples
. This includes the variables in the input list along with the complete object returned by jags.model.args
and the character vector including the names of the monitored parameters. 3. coda
An object of class mcmc.list
and length n.chains
containing the samples generated by JAGS
.
Inference is based on the output from the MCMC chain which is stored in the coda
object. Summaries of the MCMC sampler and the posterior distribution can be constructed from coda
with a variety of tools designed to work with objects of the mcmc.list
class. For example, we can use the tools in the package coda
to generate traceplots, density plots and histograms, numerical summaries of the posterior, and convergence diagnostics. The ggmcmc
package can also be used to create fancier looking traceplots, density plots, and plots summarizing the posterior distribution in terms of posterior means and quantiles (credible intervals). However, several wrapper functions exist within the package to automate some of these tasks.
The first thing we should do is to check the convergence of the chains. Summary statistics computed from the output in coda
are only valid if the chain has converged and is sampling from the correct distribution. The wrapper function convergence()
applies three of the convergence diagnostics from the coda
package.
## Compute convergence diagnostics
<- convergence(pfmcmc1) pfconvergence1
## Gelman-Rubin diagnostics
$gelman pfconvergence1
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## alpha.(Intercept) 1.003 1.015
## alpha.log(IVI) 1.004 1.016
## alpha.broodsizeIncreased 1.005 1.006
## alpha.sexMale 0.999 1.000
## psi.(Intercept) 0.999 0.999
## psi.broodsizeIncreased 1.002 1.009
## psi.sexMale 1.003 1.014
## Raftery diagnostics
$raftery pfconvergence1
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## You need a sample size of at least 3746 with these values of q, r and s
## Effective sample size
$effectiveSize pfconvergence1
## alpha.(Intercept) alpha.log(IVI) alpha.broodsizeIncreased
## 777.2150 789.0462 786.6472
## alpha.sexMale psi.(Intercept) psi.broodsizeIncreased
## 846.6271 518.5562 640.6306
## psi.sexMale
## 663.3225
Details of these measures can be found in the help files of the coda
package. Not surprisingly, these all suggest that the chains need to be run for longer.
Convergence and mixing of the chains can also be examined visually through traceplots. The wrapper function traceplots
generates traces using the functions from the ggmcmc
package and separates the variables by the model components. One of the advantages of using ggmcmc
which is based on ggplot2
is that the plots can be saved and easily reproduced. Here I have thinned the chains further prior to plotting to reduce the size of the vignette. Note that thinning is conducted relative to the original chain, not the already thinned chain. I.e., thinning by 20 selects the output from every 20th iteration of the original chain (every 2nd iteration from the retained samples).
## Generate traceplots
<- traceplots(pfmcmc1,show=FALSE,nthin=20) pfraceplots1
## Fixed effects for mean
$meanFixed pftraceplots1
## Fixed effects for dispersion
$dispersionFixed pftraceplots1
Numerical summaries of the posterior distribution computed from the output in coda
can be obtained with the summary function.
## Compute numerical summaries
<- summary(pfmcmc1) pfsummary1
## Print numerical summaries
pfsummary1
##
## Iterations = 1020:6000
## Thinning interval = 20
## Number of chains = 3
## Sample size per chain = 250
##
## Posterior Summary Statistics for Each Model Component
##
## Mean Model: Fixed Effects
## Mean Median SD Lower 95% Lower 50% Upper 50% Upper 95%
## (Intercept) -3.31 -3.31 0.07 -3.45 -3.35 -3.26 -3.20
## log(IVI) 0.15 0.15 0.01 0.13 0.14 0.16 0.18
## broodsizeIncreased 0.11 0.11 0.03 0.06 0.09 0.12 0.16
## sexMale -0.07 -0.07 0.02 -0.13 -0.09 -0.06 -0.03
##
## Dispersion Model: Fixed Effects
## Mean Median SD Lower 95% Lower 50% Upper 50% Upper 95%
## (Intercept) -0.70 -0.70 0.06 -0.81 -0.74 -0.66 -0.60
## broodsizeIncreased 0.04 0.05 0.06 -0.07 0.01 0.08 0.16
## sexMale 0.03 0.03 0.06 -0.09 -0.02 0.06 0.12
For each component of the model the tables provide statistics including the estimated posterior mean, median, and standard deviation and the end points of the 50% and 95% highest posterior density credible intervals. Note that these summaries are only valid if the previous diagnostics have suggested that the chain has converged and the effective sample size is large enough. Graphical summaries can also be computed with the caterpillar
which, like traceplots
, is a simple wrapper for the functions in the ggmcmc
package.
## Generate caterpillar
<- caterpillar(pfmcmc1,show = FALSE) pfcaterpillar1
## Fixed effects for mean
$meanFixed pfcaterpillar1
## Fixed effects for dispersion
$dispersionFixed pfcaterpillar1
We will now consider adding individual (bird) random effects to both the fixed and random components of our model. This can be done by simply adding to the model components created above and then re-runnig the model. The new model will have mean and variance \[\mu_{ij}=\beta_0 + \beta_1 \mathrm{log(IVI)}_{ij} + \beta_2 \mathrm{broodsize}_i + \beta_3 \mathrm{sex}_i + \epsilon_{i}}\] and \[\log(\phi_{ij})=\psi_0 + \psi_1 \mathrm{broodsize}_i + \psi_2\mathrm{sex}_i + \xi_j\] where \(\epsilon_i \sim N(0,\tau^2_\mu)\) and \(\xi_i \sim N(0,\tau^2_\phi)\). By default the variances \(\tau^2_\epsilon\) and \(\tau^2_\xi\) are assigned half-\(t\) prior distributions.
Instead of creating new model objects we can simply add to the old objects by creating the random
fields. As with the fixed effects, these fields are lists which must include a basename for the random effects and formula specifying the random effects model.
# Random component of mean
$random <- list(name="epsilon",formula=~-1 + indidx)
mymean
# Random component of dispersion
$random <- list(name="xi",formula=~-1 + indidx) mydisp
The new model can now be run using dalmatian
in exactly the same way as above. However, we will make one change. In order to shorten the convergence time, we will base initial values on the results of the fixed effects model and then provide these as input. In particular, we will define initial values for the fixed effects of both the mean and dispersion components by taking the values from the final iterations of each of the chains run for the fixed effects model. I have also found that it is best to set initial values for the response variable, y
, if the response is rounded. Otherwise, JAGS often fails to generate initial values that fall between the rounding limits. For convenience, we set the random effects variances equal to 1 in all three chains. Note that the chains have been run for a relatively small number of iterations and heavily subsampled to satisfy the size requirements for packages on . Neither of these is recommended in practice.
## Define initial values
<- lapply(1:3,function(i){
inits setJAGSInits(mymean,
mydisp,y = runif(nrow(pfdata),pfdata$lower,pfdata$upper),
fixed.mean = tail(pfmcmc1$coda[[i]],1)[1:4],
fixed.dispersion = tail(pfmcmc1$coda[[i]],1)[5:7],
sd.mean = 1,
sd.dispersion=1)
})
## Define list of arguments for jags.model()
<- list(file=file.path(workingDir,"pied_flycatcher_2_jags.R"),inits=inits,n.adapt=1000)
jm.args
## Define list of arguments for coda.samples()
<- list(n.iter=5000,thin=10)
cs.args
## Run the model using dalmatian
<- dalmatian(df=pfdata,
pfmcmc2 mean.model=mymean,
dispersion.model=mydisp,
jags.model.args=jm.args,
coda.samples.args=cs.args,
rounding=TRUE,
lower="lower",
upper="upper",
residuals=FALSE,
debug=FALSE)
Once again, we can use the wrapper functions within the package to examine the convergence and compute summaries of the posterior distribution. Note that the chains have been thinned in creating the traceplots to reduce the size of the package.
## Compute convergence diagnostics
<- convergence(pfmcmc2) pfconvergence2
## Gelman-Rubin diagnostics
$gelman pfconvergence2
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## alpha.(Intercept) 1.00 1.01
## alpha.log(IVI) 1.00 1.00
## alpha.broodsizeIncreased 1.01 1.04
## alpha.sexMale 1.00 1.00
## psi.(Intercept) 1.01 1.02
## psi.broodsizeIncreased 1.00 1.00
## psi.sexMale 1.01 1.02
## sd.epsilon.indidx 1.00 1.00
## sd.xi.indidx 1.00 1.01
## Raftery diagnostics
$raftery pfconvergence2
##
## Quantile (q) = 0.025
## Accuracy (r) = +/- 0.005
## Probability (s) = 0.95
##
## You need a sample size of at least 3746 with these values of q, r and s
## Effective sample size
$effectiveSize pfconvergence2
## alpha.(Intercept) alpha.log(IVI) alpha.broodsizeIncreased
## 992.9394 1320.7793 663.6353
## alpha.sexMale psi.(Intercept) psi.broodsizeIncreased
## 590.6142 407.2076 518.9162
## psi.sexMale sd.epsilon.indidx sd.xi.indidx
## 554.5618 1407.2474 750.2283
## Generate traceplots
<- traceplots(pfmcmc2, show = FALSE, nthin=20) pftraceplots2
## Fixed effects for mean
$meanFixed pftraceplots2
## Fixed effects for dispersion
$dispersionFixed pftraceplots2
## Random effects variances for mean
$meanRandom pftraceplots2
## Random effects variances for dispersion
$dispersionRandom pftraceplots2
## Compute numerical summaries
<- summary(pfmcmc2) pfsummary2
## Print numerical summaries
pfsummary2
##
## Iterations = 1010:6000
## Thinning interval = 10
## Number of chains = 3
## Sample size per chain = 500
##
## Posterior Summary Statistics for Each Model Component
##
## Mean Model: Fixed Effects
## Mean Median SD Lower 95% Lower 50% Upper 50% Upper 95%
## (Intercept) -3.09 -3.08 0.09 -3.28 -3.15 -3.03 -2.90
## log(IVI) 0.11 0.11 0.01 0.09 0.10 0.12 0.14
## broodsizeIncreased 0.11 0.11 0.07 -0.05 0.07 0.16 0.24
## sexMale -0.10 -0.10 0.07 -0.24 -0.15 -0.05 0.04
##
## Mean Model: Random Effects
## Mean Median SD Lower 95% Lower 50% Upper 50% Upper 95%
## indidx 0.23 0.23 0.03 0.18 0.21 0.25 0.29
##
## Dispersion Model: Fixed Effects
## Mean Median SD Lower 95% Lower 50% Upper 50% Upper 95%
## (Intercept) -0.80 -0.80 0.08 -0.95 -0.83 -0.73 -0.64
## broodsizeIncreased 0.02 0.02 0.09 -0.15 -0.04 0.07 0.18
## sexMale 0.10 0.10 0.09 -0.07 0.04 0.15 0.28
##
## Dispersion Model: Random Effects
## Mean Median SD Lower 95% Lower 50% Upper 50% Upper 95%
## indidx 0.21 0.21 0.05 0.12 0.18 0.24 0.31
## Generate caterpillar
<- caterpillar(pfmcmc2,show = FALSE) pfcaterpillar2
## Fixed effects for mean
$meanFixed pfcaterpillar2
## Fixed effects for dispersion
$dispersionFixed pfcaterpillar2
By using ‘prediction’ function, predictions for the mean and dispersion models can be calculated. The ‘prediction’ function should take a data frame including all variables required and their names defined for the usage of ‘dalmatian’ function. For the Pied Flycatcher data, previously ‘log(IVI)’ was used to define the mean and dispersion models. Hence, it should be included in the data frame as following:
## Add 'log(IVI)' variable in pfdata
$'log(IVI)' <- log(pfdata$IVI) pfdata
As an example we computed the fitted means and dispersion parameters from both models for the first 5 observations.
## Compute predictions for model 1
<- predict(pfmcmc,
pffitted1 newdata = pfdata[1:5,])
$mean pffitted1
## visitid boxco tape sparent sex indid visitno broodsize year date age IVI
## 366 366 2 4 2 Female 2 1 3 Decreased 1 29 7 99
## 367 367 2 4 2 Female 2 1 4 Decreased 1 29 7 11
## 368 368 2 4 2 Female 2 1 5 Decreased 1 29 7 108
## 369 369 2 4 2 Female 2 1 6 Decreased 1 29 7 331
## 370 370 2 4 2 Female 2 1 7 Decreased 1 29 7 188
## load fecal delivery boxcoidx indidx Predicted se 25%
## 366 0.1 0 0.06060606 2 2 1 -2.624661 0.02328064 -2.640194
## 367 0.0 0 0.00000000 2 2 1 -2.953695 0.03866243 -2.977745
## 368 0.4 0 0.22222222 2 2 1 -2.611631 0.02321193 -2.627012
## 369 0.2 0 0.03625378 2 2 1 -2.443913 0.02706617 -2.461623
## 370 0.1 0 0.03191489 2 2 1 -2.528623 0.02408767 -2.544806
## 2.5% 75% 97.5%
## 366 -2.669605 -2.608158 -2.581793
## 367 -3.035030 -2.928130 -2.881263
## 368 -2.657179 -2.595442 -2.568956
## 369 -2.495788 -2.425943 -2.391577
## 370 -2.574577 -2.511392 -2.483377
$dispersion pffitted1
## visitid boxco tape sparent sex indid visitno broodsize year date age IVI
## 366 366 2 4 2 Female 2 1 3 Decreased 1 29 7 99
## 367 367 2 4 2 Female 2 1 4 Decreased 1 29 7 11
## 368 368 2 4 2 Female 2 1 5 Decreased 1 29 7 108
## 369 369 2 4 2 Female 2 1 6 Decreased 1 29 7 331
## 370 370 2 4 2 Female 2 1 7 Decreased 1 29 7 188
## load fecal delivery boxcoidx indidx Predicted se 25%
## 366 0.1 0 0.06060606 2 2 1 -0.6987869 0.05669698 -0.7368451
## 367 0.0 0 0.00000000 2 2 1 -0.6987869 0.05669698 -0.7368451
## 368 0.4 0 0.22222222 2 2 1 -0.6987869 0.05669698 -0.7368451
## 369 0.2 0 0.03625378 2 2 1 -0.6987869 0.05669698 -0.7368451
## 370 0.1 0 0.03191489 2 2 1 -0.6987869 0.05669698 -0.7368451
## 2.5% 75% 97.5%
## 366 -0.806478 -0.6603081 -0.588949
## 367 -0.806478 -0.6603081 -0.588949
## 368 -0.806478 -0.6603081 -0.588949
## 369 -0.806478 -0.6603081 -0.588949
## 370 -0.806478 -0.6603081 -0.588949
## Compute predictions for model 2
<- predict(pfmcmc2,
pffitted2 newdata = pfdata[1:5,]))
$mean pffitted2
## visitid boxco tape sparent sex indid visitno broodsize year date age IVI
## 366 366 2 4 2 Female 2 1 3 Decreased 1 29 7 99
## 367 367 2 4 2 Female 2 1 4 Decreased 1 29 7 11
## 368 368 2 4 2 Female 2 1 5 Decreased 1 29 7 108
## 369 369 2 4 2 Female 2 1 6 Decreased 1 29 7 331
## 370 370 2 4 2 Female 2 1 7 Decreased 1 29 7 188
## load fecal delivery boxcoidx indidx Predicted se 25%
## 366 0.1 0 0.06060606 2 2 1 -2.614349 0.07973811 -2.670031
## 367 0.0 0 0.00000000 2 2 1 -2.862684 0.08542810 -2.922970
## 368 0.4 0 0.22222222 2 2 1 -2.604515 0.07971765 -2.660335
## 369 0.2 0 0.03625378 2 2 1 -2.477931 0.08089394 -2.533739
## 370 0.1 0 0.03191489 2 2 1 -2.541865 0.07996776 -2.597742
## 2.5% 75% 97.5%
## 366 -2.770356 -2.561110 -2.468056
## 367 -3.032711 -2.803750 -2.703029
## 368 -2.761241 -2.551992 -2.458249
## 369 -2.641477 -2.421581 -2.331423
## 370 -2.699441 -2.487788 -2.394424
$dispersion pffitted2
## visitid boxco tape sparent sex indid visitno broodsize year date age IVI
## 366 366 2 4 2 Female 2 1 3 Decreased 1 29 7 99
## 367 367 2 4 2 Female 2 1 4 Decreased 1 29 7 11
## 368 368 2 4 2 Female 2 1 5 Decreased 1 29 7 108
## 369 369 2 4 2 Female 2 1 6 Decreased 1 29 7 331
## 370 370 2 4 2 Female 2 1 7 Decreased 1 29 7 188
## load fecal delivery boxcoidx indidx Predicted se 25%
## 366 0.1 0 0.06060606 2 2 1 -0.9162287 0.1502658 -1.017949
## 367 0.0 0 0.00000000 2 2 1 -0.9162287 0.1502658 -1.017949
## 368 0.4 0 0.22222222 2 2 1 -0.9162287 0.1502658 -1.017949
## 369 0.2 0 0.03625378 2 2 1 -0.9162287 0.1502658 -1.017949
## 370 0.1 0 0.03191489 2 2 1 -0.9162287 0.1502658 -1.017949
## 2.5% 75% 97.5%
## 366 -1.214208 -0.8148692 -0.6329203
## 367 -1.214208 -0.8148692 -0.6329203
## 368 -1.214208 -0.8148692 -0.6329203
## 369 -1.214208 -0.8148692 -0.6329203
## 370 -1.214208 -0.8148692 -0.6329203