This document is intended as a short introduction to fitting MCMC models using JAGS and runjags, focussing on learning by example. The intention is to keep this document as brief as possible, but a slightly longer version with more examples is available from here. Additional information and more details are available in the user guide. For detailed documentation of individual functions, see the runjags manual for that function.
The runjags packages requires a working installation of JAGS. To test the installation, it is recommended to use the testjags() function:
testjags()
## You are using R version 4.1.2 (2021-11-01; macOS arm64) on a unix
## (macOS arm64) machine, with the X11 GUI
## JAGS version 4.3.0 (macOS universal build) found successfully using the
## command '/usr/local/bin/jags'
## The rjags package is installed
This should verify that your JAGS installation has been found. The rjags and modeest packages are not strictly required, but are reccomended packages and should be installed where possible.
A recommended introductory text for individuals not familiar with the BUGS language is:
Lunn D, Jackson C, Best N, Thomas A, Spiegelhalter D (2012). The BUGS book: A practical introduction to Bayesian analysis. CRC press.
To illustrate how to use run.jags we will look at the Salmonella example from chapter 6.5.2 of this book, with thanks to Lunn, Jackson, Best, Thomas, and Spiegelhalter 2012, for kind permission to reproduce this code. The model below was taken from the WinBUGS example, modified to include a mandatory pair of curly brackets to demarcate the Data and Inits blocks, and adding an optional second Inits block with different starting values for 2 chains:
filestring <- "
Poisson model...
model {
for (i in 1:6) {
for (j in 1:3) {
y[i,j] ~ dpois(mu[i])
}
log(mu[i]) <- alpha + beta*log(x[i] + 10) + gamma*x[i]
}
for (i in 1:6) {
y.pred[i] ~ dpois(mu[i])
}
alpha ~ dnorm(0, 0.0001)
beta ~ dnorm(0, 0.0001)
gamma ~ dnorm(0, 0.0001)
}
Data{
list(y = structure(.Data = c(15,21,29,16,18,21,16,26,33,
27,41,60,33,38,41,20,27,42),
.Dim = c(6, 3)),
x = c(0, 10, 33, 100, 333, 1000))
}
Inits{
list(alpha = 0, beta = 1, gamma = -0.1)
}
Inits{
list(alpha = 10, beta = 0, gamma = 0.1)
}
"
The following call to run.jags reads, compiles, runs and returns the model information along with MCMC samples and summary statistics:
results <- run.jags(filestring, monitor=c('alpha','beta','gamma'))
## Warning: Convergence cannot be assessed with only 1 chain
Note that this model is run from an embedded character vector here for convinience, but run.jags will also accept a file path as the main argument in which case the information will be read from the specified file. There is a print method associated with runjags class objects:
results
##
## JAGS model summary statistics from 10000 samples (adapt+burnin = 5000):
##
## Lower95 Median Upper95 Mean SD Mode
## alpha 1.7815 2.1851 2.6042 2.1792 0.2093 2.1946
## beta 0.21195 0.31588 0.42439 0.3179 0.054427 0.30989
## gamma -0.0014861 -0.001008 -0.00056192 -0.0010115 0.00023611 -0.00098606
##
## MCerr MC%ofSD SSeff AC.10 psrf
## alpha 0.020443 9.8 105 0.79817 --
## beta 0.0054986 10.1 98 0.81745 --
## gamma 0.000018389 7.8 165 0.55523 --
##
## Total time taken: 0.4 seconds
This method displays relevant summary statistics - the effective sample size (SSeff) and convergence diagnostic (psrf) is shown for each stochastic variable. It is also advisable to plot the results using the default plot method to assess convergence graphically:
plot(results, layout=c(3,4))
In this case the chains have converged, but there is a large amount of auto-correlation and therefore a small effective sample size, which can be reduced by loading the GLM module in JAGS:
resultswithglm <- run.jags(filestring, monitor=c('alpha','beta','gamma'), modules='glm')
## module glm loaded
resultswithglm
##
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##
## Lower95 Median Upper95 Mean SD Mode
## alpha 1.72 2.1553 2.5824 2.1543 0.22044 2.1614
## beta 0.21515 0.32463 0.43986 0.32457 0.057315 0.32424
## gamma -0.0015228 -0.0010379 -0.00055605 -0.001038 0.00024478 -0.0010428
##
## MCerr MC%ofSD SSeff AC.10 psrf
## alpha 0.0029778 1.4 5480 0.014321 1.0012
## beta 0.00072382 1.3 6270 0.011373 1.0009
## gamma 2.8022e-06 1.1 7631 -0.0015343 1.0002
##
## Total time taken: 0.6 seconds
This second model has a smaller AC.10 (autocorrelation with lag 10) and therefore larger SSeff (effective sample size), so there will be much less Monte Carlo error associated with these estimates comapred to the first model.
To facilitate running models using data that is already in R, runjags will look for a number of comment tags in the model file to indicate the variables that should be extracted from the R working environment. For example the following model uses N, Y and X as data, and coef, int and precision are given initial values, all of which are taken from R. In addition, the model indicates to monitor coef, int and precision so that no monitor vector is required in the call to run.jags:
# Simulate the data:
set.seed(1)
X <- 1:100
Y <- rnorm(length(X), 2*X + 10, 1)
N <- length(X)
model <- "
model {
for(i in 1 : N){ #data# N
Y[i] ~ dnorm(true.y[i], precision) #data# Y
true.y[i] <- (coef * X[i]) + int #data# X
}
coef ~ dunif(-1000,1000)
int ~ dunif(-1000,1000)
precision ~ dexp(1)
#inits# coef, int, precision
#monitor# coef, int, precision
}"
# A function to return initial values for each chain:
coef <- function(chain) return(switch(chain, "1"= -10, "2"= 10))
int <- function(chain) return(switch(chain, "1"= -10, "2"= 10))
precision <- function(chain) return(switch(chain, "1"= 0.01, "2"= 100))
# Run the simulation:
results <- run.jags(model, n.chains = 2)
The output of this function can be extended using:
results <- extend.jags(results, sample=5000)
This function call takes an additional 5000 samples and combines them with the original simulation. It is also possible to automatically extend a model until it has converged (as assessed by Gelman and Rubin's convergence diagnostic) using the autorun.jags and autoextend.jags functions, although manual inspection of trace plots is always recommended to ensure chains have really converged. Note also that each of the functions run.jags, extend.jags, autorun.jags and autoextend.jags allows a method argument to be specified, which allows separate chains to be run on parallel processors where available. Full details are provided are in the user guide, but the most used options are 'rjags', 'interrputible', 'parallel' and 'background'.
One of the most difficult things about getting started with JAGS or BUGS is the requirement to write the model yourself. Looking at examples is a good way to learn, but it is even easier when the example is tailored towards your specific model. The template.jags function creates a model template file based on an lme4-style formula interface. We can look at an example from the help file for lm:
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
D9 <- data.frame(weight, group)
# The JAGS equivalent:
model <- template.jags(weight ~ group, D9, n.chains=2, family='gaussian')
The model is written to disk by template.jags, and includes all of the data and initial values necessary to run the model. Examining this model file is a good way to learn how the BUGS syntax works, as well as some of the options allowed by the runjags package. To run the models:
JAGS.D9 <- run.jags(model)
lm.D9 <- lm(weight ~ group, data=D9)
And to compare results:
JAGS.D9
##
## JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
##
## Lower95 Median Upper95 Mean SD Mode
## regression_precision 0.79911 1.9942 3.4035 2.0655 0.69044 1.8823
## intercept 4.5659 5.033 5.4881 5.0338 0.23258 5.0242
## group_effect[1] 0 0 0 0 0 0
## group_effect[2] -1.0184 -0.37341 0.27707 -0.3725 0.32906 -0.35997
## deviance 40.196 42.722 48.656 43.408 2.6489 41.62
## resid.sum.sq 8.7293 9.4188 12.178 9.8114 1.2275 9.0282
##
## MCerr MC%ofSD SSeff AC.10 psrf
## regression_precision 0.0054184 0.8 16237 -0.00067184 1.0003
## intercept 0.0016238 0.7 20516 -0.0065401 0.99997
## group_effect[1] -- -- -- -- --
## group_effect[2] 0.0022554 0.7 21287 -0.019725 1.0002
## deviance 0.021074 0.8 15800 0.0098344 1.0004
## resid.sum.sq 0.0094602 0.8 16838 0.0010838 1.0007
##
## Model fit assessment:
## DIC = 46.73117 (range between chains: 46.70924 - 46.7531)
## [PED not available from the stored object]
## Estimated effective number of parameters: pD = 3.32309
##
## Total time taken: 0.3 seconds
summary(lm.D9)
##
## Call:
## lm(formula = weight ~ group, data = D9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0710 -0.4938 0.0685 0.2462 1.3690
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.0320 0.2202 22.850 9.55e-15 ***
## groupTrt -0.3710 0.3114 -1.191 0.249
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6964 on 18 degrees of freedom
## Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158
## F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249
Note that lm reports sigma and JAGS the precision - to make them more comparable we could use a mutate function:
JAGS.D9 <- run.jags(model, mutate=list(prec2sd, 'precision'))
And focus on the results from the summary that we are interested in:
summary(JAGS.D9, vars=c('regression_precision.sd', 'intercept', 'group_effect'))
## Lower95 Median Upper95 Mean SD
## regression_precision.sd 0.5060187 0.7093546 0.9947975 0.7265715 0.1293490
## intercept 4.5690358 5.0324026 5.4880989 5.0307522 0.2330697
## group_effect[1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## group_effect[2] -1.0177257 -0.3745063 0.2835764 -0.3720167 0.3291646
## Mode MCerr MC%ofSD SSeff AC.10
## regression_precision.sd 0.6891279 0.001033877 0.8 15653 0.003758877
## intercept 5.0286923 0.001648052 0.7 20000 -0.010297850
## group_effect[1] 0.0000000 NA NA NA NA
## group_effect[2] -0.3752499 0.002327545 0.7 20000 -0.005821649
## psrf
## regression_precision.sd 0.9999540
## intercept 0.9999971
## group_effect[1] NA
## group_effect[2] 1.0000784
We can also compare the estimated residuals using for example:
summary(residuals(lm.D9) - residuals(JAGS.D9, output='mean'))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.538e-03 -1.538e-03 -8.002e-04 -8.002e-04 -6.207e-05 -6.207e-05
The Deviance Information Criterion (DIC) is typically used to compare models fit using JAGS and BUGS, and is displayed as part of the print method shwon above. This can also be extracted from a runjags model using the extract method, along with other information such as the samplers used:
extract(JAGS.D9, what='samplers')
## Index.No Sampler Node
## 1 1 bugs::ConjugateGamma regression_precision
## 2 2 glm::Generic group_effect[2]
## 3 2 glm::Generic intercept
Another possible method to assess the fit of Bayesian models is using leave-one-out cross-validation, or a drop-k study. A utility function is included within runjags to facilitate this - for more details see the help file for the drop.k function or the extended version of this vignette.
There are a large number of global options that can be controlled using the runjags.options function. For an explanation of these see the help file for the relevant function:
?runjags.options
There is a sourceforge repository associated with the runjags package, which contains additional versions of the package and internal JAGS module. This page also contains the extended version of this vignette which contains additional material and examples not found in the version hosted on CRAN.
This package is in constant development, so if you find any issues or the code does not behave as expected please either email the pacckage developer or visit the bug-tracker page to report the problem. General comments and suggestions on how you find using this package are also very welcome.
Devleopment of this R package has consumed a great deal of time and energy, and is a somewhat peripheral activity in my current academic position. In order to justify continued devleopment of the software I need to be able to demonstrate that it is being used within the general research community. So, if you use this software as part of a publication, please remember to cite the software as follows:
citation('runjags')
##
## To cite runjags in publications use:
##
## Matthew J. Denwood (2016). runjags: An R Package Providing Interface
## Utilities, Model Templates, Parallel Computing Methods and Additional
## Distributions for MCMC Models in JAGS. Journal of Statistical
## Software, 71(9), 1-25. doi:10.18637/jss.v071.i09
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {{runjags}: An {R} Package Providing Interface Utilities, Model Templates, Parallel Computing Methods and Additional Distributions for {MCMC} Models in {JAGS}},
## author = {Matthew J. Denwood},
## journal = {Journal of Statistical Software},
## year = {2016},
## volume = {71},
## number = {9},
## pages = {1--25},
## doi = {10.18637/jss.v071.i09},
## }
It is also important to cite JAGS and R!
sessionInfo()
## R version 4.1.2 (2021-11-01)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.3.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
##
## locale:
## [1] C/en_US.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] parallel stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] rjags_4-12 coda_0.19-4 runjags_2.2.1-7
##
## loaded via a namespace (and not attached):
## [1] lattice_0.20-45 stabledist_0.7-1 grid_4.1.2
## [4] fBasics_3042.89.1 magrittr_2.0.2 evaluate_0.14
## [7] statip_0.2.3 stringi_1.7.6 spatial_7.3-14
## [10] rmutil_1.1.9 timeDate_3043.102 rpart_4.1-15
## [13] modeest_2.4.0 tools_4.1.2 stringr_1.4.0
## [16] xfun_0.29 compiler_4.1.2 clue_0.3-60
## [19] cluster_2.1.2 stable_1.1.6 timeSeries_3062.100
## [22] knitr_1.37