Abstract
This vignette provides an overview of the r3PG R package functions and options. We provide a working examples that (i) demonstrates the basic functionality and use of the package, (ii) performs a sensitivity analysis and a Bayesian calibration of the model, and (iii) uses the calibrated model to simulate spatial patterns of forest growth across Switzerland.
r3PG
provides an implementation of the Physiological
Processes Predicting Growth (3-PG) model (Landsberg &
Waring, 1997). The r3PG
serves as a flexible and
easy-to-use interface for the 3-PGpjs
(Sands, 2010) and the
3-PGmix
(Forrester & Tang, 2016) models written in
Fortran
. The package, allows for fast and easy interaction
with the model, and Fortran
re-implementation facilitates
computationally intensive sensitivity analysis and calibration. The user
can flexibly switch between various options and submodules to use the
original 3-PGpjs
model version for monospecific, even-aged
and evergreen forests and the 3-PGmix
model, which can also
simulate multi-cohort stands (e.g. mixtures, uneven-aged) that contain
deciduous species.
To demonstrate the basic functionality of the r3PG
R
package, we will perform a simple simulation with the
3-PGmix
model. The central function to run
3-PG
from R is run_3PG
. When called, the
function will:
Before using run_3PG
the user must prepare the input
data as described in the help of run_3PG
. The inputs
include information about site conditions, species initial conditions,
climate data, and parameters (if they need to be modified).
In this example, we run a simulation for mixed Fagus
sylvatica and Pinus sylvestris stands (Forrester et al.,
2017). The input data are provided as internal data in
r3PG
.
library(r3PG)
library(dplyr)
library(ggplot2)
<- run_3PG(
out_3PG site = d_site,
species = d_species,
climate = d_climate,
thinning = d_thinning,
parameters = d_parameters,
size_dist = d_sizeDist,
settings = list(light_model = 2, transp_model = 2, phys_model = 2,
height_model = 1, correct_bias = 0, calculate_d13c = 0),
check_input = TRUE, df_out = TRUE)
head( out_3PG )
#> date species group variable value
#> 1 2001-01-31 Fagus sylvatica climate tmp_min -1.4669526
#> 2 2001-02-28 Fagus sylvatica climate tmp_min -0.8616548
#> 3 2001-03-31 Fagus sylvatica climate tmp_min 3.8046124
#> 4 2001-04-30 Fagus sylvatica climate tmp_min 2.7947164
#> 5 2001-05-31 Fagus sylvatica climate tmp_min 9.7551870
#> 6 2001-06-30 Fagus sylvatica climate tmp_min 10.0666666
The output of the run_3PG
function can be either a long
format dataframe or a 4-dimentional array where each row corresponds to
one month of simulations. The second dimension corresponds to species,
where each column represents one species. The third dimension
corresponds to variable groups and the variables themselves To get
information about output variables and their group, please look at
data('i_output')
.
As an example for how to visualize the output, we select six
variables: basal area, dbh, height, stem
biomass, foliage biomass, and root biomass. For
visualization we are using a ggplot
function, which allows
us to visualize all the outputs side-by-side.
<- c('stems_n', 'dbh', 'height', 'biom_stem', 'biom_root', 'biom_foliage')
i_var <- c('Stem density', 'DBH', 'Height', 'Stem biomass', 'Root biomass', 'Foliage biomass')
i_lab
%>%
out_3PG filter(variable %in% i_var) %>%
mutate(variable = factor(variable, levels = i_var)) %>%
ggplot( aes(date, value))+
geom_line( aes(color = species), size = 0.5)+
facet_wrap( ~ variable, scales = 'free_y', ncol = 3,
labeller = labeller(variable = setNames(i_lab, i_var) )) +
scale_color_brewer('', palette = 'Dark2') +
theme_classic()+
theme(legend.position="bottom")+
xlab("Calendar date") + ylab('Value')
As a second case study, we want do a slightly more ambitious project, which is a sensitivity analysis and calibration of the model.
For both tasks, we use freely available forest growth data from the
PROFOUND database.
The data can be extracted from the sql database with the ProfoundData R
package, which is available on CRAN. For convenience, however, we have
included the extracted and cleaned data in the required format in the
r3PG
package. All the data used in this vignette can be
downloaded from r3PG/vignettes_build
load('vignette_data/solling.rda')
Loading necessary packages
library(tidyr)
library(purrr)
library(BayesianTools)
library(sensitivity)
As a second example, we will first performed a Morris
screening and then perform the Bayesian calibration
of the
3-PGpjs model. We will use the Solling
forest flux site
located in Germany at an elevation of 508 m.a.s.l., and dominated by
Picea abies. We obtained data for this site via the PROFOUND database that
was set up for evaluating vegetation models and simulating climate
impacts on forests (Reyer et al., 2019). There are almost 50 years of
records for various stand and flux variables for Solling
site.
To perform a morris
sensitivity analysis and
Bayesian calibration
we construct the log Likelihood
function. The log Likelihood function was calculated using normal error
assumptions for six variables that describe stand stocks and
characteristics: basal area, dbh, height,
stem biomass, foliage biomass, and root
biomass. To calculate the stand-level stocks, we applied the
biomass equations developed for European forests following Forrester et
al. (2017) for each measured tree, and summed it up to the stand level
in Mg dry matter \(ha^{-1}\).
<- function( par_df, ...){
r3pg_sim #' @description function to simulate the model for a given parameter
#' @param par_df data frame of the parameters with two columns: parameter, piab
<- run_3PG(
sim.df site = site_solling,
species = species_solling,
climate = climate_solling,
thinning = thinn_solling,
parameters = par_df,
size_dist = NULL,
settings = list(light_model = 1, transp_model = 1, phys_model = 1),
check_input = TRUE, ...)
return( sim.df )
}
<- function( par_v ){
r3pg_ll #' @param par_v a vector of parameters for the calibration, including errors
#' par_v = par_cal_best
# replace the default values for the selected parameters
= grep('err', par_cal_names)
err_id
= data.frame(parameter = par_cal_names[-err_id], piab = par_v[-err_id])
par_df = par_v[err_id]
err_v
# simulate the model
<- r3pg_sim(par_df, df_out = FALSE)
sim.df <- cbind(sim.df[,,2,c(3,5,6)], sim.df[,,4,c(1,2,3)])
sim.df
# calculate the log likelihood
<- sapply(1:6, function(i) {
logpost likelihoodIidNormal( sim.df[,i], observ_solling_mat[,i], err_v[i] )
%>% sum(.)
})
if(is.nan(logpost) | is.na(logpost) | logpost == 0 ){logpost <- -Inf}
return( logpost )
}
For Morris sensitivity analysis and Bayesian calibration we define each parameters range, for which we will evaluate the sensitivity and later perform the calibration. In addition, we specified error parameters for the observational data.
# default parameters
<- setNames(param_solling$default, param_solling$param_name)
par_def <- setNames(error_solling$default, error_solling$param_name)
err_def
# parameters for calibration and their ranges
<- bind_rows(param_solling, error_solling) %>% filter(!is.na(min))
param_morris.df <- param_morris.df$param_name
par_cal_names <- param_morris.df$min
par_cal_min <- param_morris.df$max
par_cal_max <- param_morris.df$default
par_cal_best
r3pg_ll( par_cal_best)
#> [1] -191.9541
Using the settings specified below, we will run a total of 30500 model runs (N (length(factors)+1)) and visualize the results. The morris sensitivity run will take approximately 4 minutes on the working computer.
<- createBayesianSetup(
morris_setup likelihood = r3pg_ll,
prior = createUniformPrior(par_cal_min, par_cal_max, par_cal_best),
names = par_cal_names)
set.seed(432)
<- morris(
morrisOut model = morris_setup$posterior$density,
factors = par_cal_names,
r = 500,
design = list(type = "oat", levels = 20, grid.jump = 3),
binf = par_cal_min,
bsup = par_cal_max,
scale = TRUE)
# summarise the moris output
<- data.frame(
morrisOut.df parameter = par_cal_names,
mu.star = apply(abs(morrisOut$ee), 2, mean, na.rm = T),
sigma = apply(morrisOut$ee, 2, sd, na.rm = T)
%>%
) arrange( mu.star )
load('vignette_data/morris.rda')
We visualize the results of the morris
analysis by
listing all the parameters and error parameters. A high \(\mu^{*}\) indicates a factor with an
important overall influence on model output; a high \(\alpha\) indicates either a factor
interacting with other factors or a factor whose effects are
non-linear.
%>%
morrisOut.df gather(variable, value, -parameter) %>%
ggplot(aes(reorder(parameter, value), value, fill = variable), color = NA)+
geom_bar(position = position_dodge(), stat = 'identity') +
scale_fill_brewer("", labels = c('mu.star' = expression(mu * "*"), 'sigma' = expression(sigma)), palette="Dark2") +
theme_classic() +
theme(
axis.text = element_text(size = 6),
axis.text.x = element_text(angle=90, hjust=1, vjust = 0.5),
axis.title = element_blank(),
legend.position = c(0.05 ,0.95),legend.justification = c(0.05,0.95)
)
We will run the Bayesian calibration for the 20 most sensitive
parameters defined by the morris
sensitivity plus six
errors parameters for each of the observational variables. We will run 3
parallel chains, each on a separate computer node for 4e+06 iterations.
This can easily be done on a computer server, and takes approximately 21
hours.
# which parameters to calibrate
<- morrisOut.df$parameter %>% .[-grep('err', .)] %>% tail(., 20) %>% as.character()
par_select <- which(par_cal_names %in% c(par_select, error_solling$param_name) )
par_id
<- par_cal_names[par_id]
par_cal_names <- par_cal_min[par_id]
par_cal_min <- par_cal_max[par_id]
par_cal_max <- par_cal_best[par_id]
par_cal_best
<- createBayesianSetup(
mcmc_setup likelihood = r3pg_ll,
prior = createUniformPrior(par_cal_min, par_cal_max, par_cal_best),
names = par_cal_names)
<- runMCMC(
mcmc_out bayesianSetup = mcmc_setup,
sampler = "DEzs",
settings = list(iterations = 4e+03, nrChains = 3))
load('vignette_data/mcmc.rda')
As the first step, we would look at the Gelman-Rubin diagnostic. The convergence can be accepted because the multivariate potential scale reduction factor was \(\le\) 1.1.
gelmanDiagnostics( mcmc_out )
#> Potential scale reduction factors:
#>
#> Point est. Upper C.I.
#> pFS20 1.01 1.01
#> aWS 1.01 1.02
#> nWS 1.01 1.03
#> pRn 1.02 1.04
#> Tmin 1.01 1.02
#> Topt 1.02 1.05
#> Tmax 1.02 1.03
#> fN0 1.02 1.03
#> fNn 1.01 1.02
#> MaxAge 1.02 1.04
#> rAge 1.03 1.05
#> gammaN1 1.01 1.01
#> thinPower 1.02 1.05
#> mS 1.01 1.01
#> alphaCx 1.06 1.12
#> rhoMin 1.01 1.02
#> rhoMax 1.01 1.03
#> aH 1.02 1.04
#> nHB 1.03 1.06
#> nHC 1.03 1.06
#> err_basal_area 1.06 1.11
#> err_dbh 1.03 1.05
#> err_height 1.02 1.04
#> err_biom_stem 1.02 1.03
#> err_biom_root 1.02 1.03
#> err_biom_foliage 1.01 1.03
#>
#> Multivariate psrf
#>
#> 1.11
To evaluate the model performance, we draw ~500 samples from the mcmc object and run model simulations for each of the parameter combinations to understand model uncertainties.
<- select(param_solling, parameter = param_name, piab = default)
par_def.df
<- getSample(mcmc_out, numSamples = 500, coda = F, whichParameters = 1:20) %>%
param.draw as.data.frame() %>%
mutate(mcmc_id = 1:n()) %>%
nest_legacy(-mcmc_id, .key = 'pars') %>%
mutate(
pars = map(pars, unlist),
pars = map(pars, ~tibble::enframe(.x, 'parameter', 'piab')),
pars = map(pars, ~bind_rows(.x, filter(par_def.df, !parameter %in% par_cal_names))))
We then simulate forest growth using the defaults and the drawn parameters combinations. Afterwards, we visualize the posteriori prediction range by drawing the \(5^{th}\) and \(95^{th}\) range of predictions. The calibrated model simulates the observational data well.
# default run
<- r3pg_sim( par_df = par_def.df, df_out = T) %>%
def_run.df select(date, variable, value) %>%
mutate(run = 'default')
# calibrated run
<- param.draw %>%
post_run.df mutate( sim = map( pars, ~r3pg_sim(.x, df_out = T)) ) %>%
select(mcmc_id, sim) %>%
unnest_legacy() %>%
group_by(date, variable) %>%
summarise(
q05 = quantile(value, 0.05, na.rm = T),
q95 = quantile(value, 0.95, na.rm = T),
value = quantile(value, 0.5, na.rm = T)) %>%
ungroup() %>%
mutate(run = 'calibrated')
<- bind_rows( def_run.df, post_run.df)
sim.df
# Visualize
<- c('basal_area', 'dbh', 'height', 'biom_stem', 'biom_root', 'biom_foliage')
i_var <- c('Basal area', 'DBH', 'Height', 'Stem biomass', 'Root biomass', 'Foliage biomass')
i_lab
<- observ_solling %>%
observ_solling.df gather(variable, value, -date) %>%
filter(variable %in% i_var) %>%
filter(!is.na(value)) %>%
mutate(variable = factor(variable, levels = i_var))
%>%
sim.df filter(variable %in% i_var) %>%
mutate(variable = factor(variable, levels = i_var)) %>%
ggplot(aes(date, value))+
geom_ribbon(aes(ymin = q05, ymax = q95, fill = run), alpha = 0.5)+
geom_line( aes(color = run), size = 0.2)+
geom_point( data = observ_solling.df, color = 'grey10', size = 0.1) +
facet_wrap( ~ variable, scales = 'free_y', nrow = 2, labeller = labeller(variable = setNames(i_lab, i_var) )) +
scale_color_manual('', values = c('calibrated' = '#1b9e77', 'default' = '#d95f02')) +
scale_fill_manual('', values = c('calibrated' = '#1b9e77', 'default' = '#d95f02'), guide = F) +
theme_classic() +
theme(legend.position="bottom")+
xlab("Calendar date") + ylab('Value')
3-PG is a cohort level model. Thus, to make a simulation across a landscapes, we need to explicitly simulate each individual grid point. For this purpose, we will simulate Picea abies stand biomass across Switzerland on a 1x1 km grid. In total, this sums to ~18’000 grid points over Switzerland.
We have prepared the input data, including climate and soil information for each of the grid points. We used interpolated meteorological data from the Landscape Dynamics group (WSL, Switzerland) based on data from MeteoSwiss stations (Swiss Federal Office of Meteorology and Climatology) by employing the DAYMET method (Thornton, Running, & White, 1997). Grid-specific information on soil type and plant available soil water was retrieved from European Soil Database Derived data (Hiederer 2013). In addition, we use ~500 samples drawn from the previously obtained mcmc object and run model simulations for each of the parameter combinations to understand simulation uncertainties.
load('vignette_data/grid_input.rda')
We construct a function, which requires site and climate information
as input, and calculates the \(5^{th}\), \(50^{th}\) and \(95^{th}\) percentile from ~500 simulated
runs. To do so, we use the multidplyr
package for
distributing computing. In total, it takes less than 1 hour on a
computer cluster with 50 processes (CPU time ~20 hours).
library(multidplyr)
<- function(site, forc){
r3pg_grid #' @description simulate the n runs for a given site with the drawn parameter combination
<- function( par_df){
r3pg_int #' @description function to run one site and return required output on standing biomass
#' @param par_df a data.frame of parameters
<- run_3PG(site, species.grid, forc, thinn.grid, par_df, NULL,
out list(light_model = 1, transp_model = 1, phys_model = 1), check_input = TRUE,
df_out = F)[,,4,1]
return( last( out ) )
}
<- param.draw %>%
site_out mutate( sim = map( pars, r3pg_int)) %>%
select(mcmc_id, sim) %>%
unnest_legacy() %>%
summarise(
q05 = quantile(sim, 0.05, na.rm = T),
q95 = quantile(sim, 0.95, na.rm = T),
value = quantile(sim, 0.5, na.rm = T))
return( site_out )
}
<- new_cluster(n = 2) %>% # or more cores if desired
cl_in cluster_library(c('r3PG', 'purrr', 'dplyr', 'tidyr')) %>%
cluster_copy(names = c("r3pg_grid", "species.grid", "thinn.grid", "param.draw"))
#' `hide the sample_n`
<- inner_join(site.grid, climate.grid, by = 'grid_id') %>%
sim.grid sample_n(10) %>% # remove this for full simulation
partition(cluster = cl_in) %>%
mutate( out = map2(site, forc, ~r3pg_grid(.x, .y))) %>%
select( grid_id, out) %>%
collect() %>%
unnest_legacy() %>%
ungroup()
load('vignette_data/grid_sim.rda')
Once the simulations are done, we visualize the results for average standing biomass and associated uncertainties for the whole landscape.
%>%
sim.grid mutate( range = q95 - q05) %>%
select( grid_id, mean = value, range) %>%
gather( variable, value, -grid_id) %>%
inner_join( ., coord.grid, by = 'grid_id') %>%
ggplot( aes(x, y, fill = value) ) +
geom_raster()+
facet_wrap(~variable)+
theme_void()+
coord_equal() +
scale_fill_distiller( '', palette = 'Spectral', limits = c(0, 250))
Forrester, D. I., Tachauer, I. H. H., Annighoefer, P., Barbeito, I., Pretzsch, H., Ruiz-Peinado, R., … Sileshi, G. W. (2017). Generalized biomass and leaf area allometric equations for European tree species incorporating stand structure, tree age and climate. Forest Ecology and Management, 396, 160–175. https://doi.org/10.1016/j.foreco.2017.04.011
Forrester, David I., & Tang, X. (2016). Analysing the spatial and temporal dynamics of species interactions in mixed-species forests and the effects of stand density using the 3-PG model. Ecological Modelling, 319, 233–254. https://doi.org/10.1016/j.ecolmodel.2015.07.010
Hiederer, R. 2013. Mapping Soil Properties for Europe - Spatial Representation of Soil Database Attributes. Luxembourg: Publications Office of the European Union – 2013 – 47pp. – EUR26082EN Scientific and Technical Research series, ISSN 1831-9424, https://doi.org/10.2788/94128
Reyer, C. P. O., Silveyra Gonzalez, R., Dolos, K., Hartig, F., Hauf, Y., Noack, M., … Frieler, K. (2019). The PROFOUND database for evaluating vegetation models and simulating climate impacts on forests. Earth System Science Data Discussions, 1–47. https://doi.org/10.5194/essd-2019-220
Thornton, P. E., Running, S. W., & White, M. A. (1997). Generating surfaces of daily meteorological variables over large regions of complex terrain. Journal of Hydrology, 190(3), 214–251. https://doi.org/10.1016/S0022-1694(96)03128-9
sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] sensitivity_1.27.0 BayesianTools_0.1.7 purrr_0.3.4
#> [4] tidyr_1.2.0 ggplot2_3.3.5 dplyr_1.0.8
#> [7] r3PG_0.1.4 knitr_1.38
#>
#> loaded via a namespace (and not attached):
#> [1] Brobdingnag_1.2-7 tidyselect_1.1.2 xfun_0.30
#> [4] bslib_0.3.1 splines_4.2.0 lattice_0.20-45
#> [7] colorspace_2.0-3 vctrs_0.4.1 generics_0.1.2
#> [10] htmltools_0.5.2 yaml_2.3.5 utf8_1.2.2
#> [13] rlang_1.0.2 nloptr_2.0.1 jquerylib_0.1.4
#> [16] pillar_1.7.0 glue_1.6.2 withr_2.5.0
#> [19] DBI_1.1.2 RColorBrewer_1.1-3 foreach_1.5.2
#> [22] lifecycle_1.0.1 stringr_1.4.0 munsell_0.5.0
#> [25] gtable_0.3.0 mvtnorm_1.1-3 codetools_0.2-18
#> [28] coda_0.19-4 evaluate_0.15 labeling_0.4.2
#> [31] fastmap_1.1.0 numbers_0.8-2 fansi_1.0.3
#> [34] highr_0.9 Rcpp_1.0.8.3 scales_1.2.0
#> [37] DHARMa_0.4.5 jsonlite_1.8.0 farver_2.1.0
#> [40] lme4_1.1-29 digest_0.6.29 stringi_1.7.6
#> [43] grid_4.2.0 cli_3.3.0 tools_4.2.0
#> [46] magrittr_2.0.3 sass_0.4.1 tibble_3.1.6
#> [49] crayon_1.5.1 pkgconfig_2.0.3 ellipsis_0.3.2
#> [52] MASS_7.3-56 Matrix_1.4-1 iterators_1.0.14
#> [55] bridgesampling_1.1-2 assertthat_0.2.1 minqa_1.2.4
#> [58] rmarkdown_2.14 rstudioapi_0.13 R6_2.5.1
#> [61] boot_1.3-28 nlme_3.1-157 compiler_4.2.0