library(dissever)
library(raster)
library(viridis)
library(dplyr)
library(magrittr)
Here is the coarse model to dissever:
plot(edgeroi$carbon, col = viridis(100))
and here are the covariates that will be used for the disseveration:
plot(edgeroi$predictors, col = viridis(100))
In this section, we will dissever the sample dataset with the available environmental covariates, using 4 different regression methods:
"rf"
)"cubist"
)"earth"
)"lm"
)It is simple to switch between regression methods using the method
option of dissever
.
But first let’s setup some parameters:
min_iter <- 5 # Minimum number of iterations
max_iter <- 10 # Maximum number of iterations
p_train <- 0.025 # Subsampling of the initial data
We can then launch the 4 different disseveration procedures:
# Random Forest
res_rf <- dissever(
coarse = edgeroi$carbon, # stack of fine resolution covariates
fine = edgeroi$predictors, # coarse resolution raster
method = "rf", # regression method used for disseveration
p = p_train, # proportion of pixels sampled for training regression model
min_iter = min_iter, # minimum iterations
max_iter = max_iter # maximum iterations
)
# Cubist
res_cubist <- dissever(
coarse = edgeroi$carbon,
fine = edgeroi$predictors,
method = "cubist",
p = p_train,
min_iter = min_iter,
max_iter = max_iter
)
# GAM
res_gam <- dissever(
coarse = edgeroi$carbon,
fine = edgeroi$predictors,
method = "gamSpline",
p = p_train,
min_iter = min_iter,
max_iter = max_iter
)
# Linear model
res_lm <- dissever(
coarse = edgeroi$carbon,
fine = edgeroi$predictors,
method = "lm",
p = p_train,
min_iter = min_iter,
max_iter = max_iter
)
The object returned by dissever
is an object of class dissever
. It’s basically a list
with 3 elements: * fit
: a train
object storing the regression model used in the final disseveration * map
: a RasterLayer
object storing the dissevered map * perf
: a data.frame
object storing the evolution of the RMSE (and confidence intervals) with the disseveration iterations.
The dissever
result has a plot
function that can plot either the map or the performance results, depending on the type
option.
Below are the dissevered maps for all 4 regression methods:
# Plotting maps
par(mfrow = c(2, 2))
plot(res_rf, type = 'map', main = "Random Forest")
plot(res_cubist, type = 'map', main = "Cubist")
plot(res_gam, type = 'map', main = "GAM")
plot(res_lm, type = 'map', main = "Linear Model")
We can also analyse and plot the optimisation of the dissevering model:
# Plot performance
par(mfrow = c(2, 2))
plot(res_rf, type = 'perf', main = "Random Forest")
plot(res_cubist, type = 'perf', main = "Cubist")
plot(res_gam, type = 'perf', main = "GAM")
plot(res_lm, type = 'perf', main = "Linear Model")
The tools provided by the caret
package can also be leveraged, e.g. to plot the observed vs. predicted values:
# Plot preds vs obs
preds <- extractPrediction(list(res_rf$fit, res_cubist$fit, res_gam$fit, res_lm$fit))
plotObsVsPred(preds)
The models can be compared using the preds
object that we just derived using the extractPrediction
function. In this case I’m computing the R2, the RMSE and the CCC for each model:
# Compare models
perf <- preds %>%
group_by(model, dataType) %>%
summarise(
rsq = cor(obs, pred)^2,
rmse = sqrt(mean((pred - obs)^2))
)
perf
## # A tibble: 4 x 4
## # Groups: model [?]
## model dataType rsq rmse
## <fct> <fct> <dbl> <dbl>
## 1 cubist Training 0.260 5.35
## 2 gamSpline Training 0.312 4.66
## 3 lm Training 0.349 4.52
## 4 rf Training 0.951 1.53
We can either take the best option (in this case Random Forest – but the results probably show that we should add some kind of validation process), or use a simple ensemble-type approach. For the latter, I am computing weights for each of the maps based on the CCC of the 4 different regression models:
# We can weight results with Rsquared
w <- perf$rsq / sum(perf$rsq)
# Make stack of weighted predictions and compute sum
l_maps <- list(res_cubist$map, res_gam$map, res_lm$map, res_rf$map)
ens <- lapply(1:4, function(x) l_maps[[x]] * w[x]) %>%
stack %>%
sum
Ensemble modelling seem to work better when the models are not too correlated – i.e. when they capture different facets of the data:
s_res <- stack(l_maps)
names(s_res) <- c('Cubist', 'GAM', 'Linear Model', 'Random Forest')
s_res %>%
as.data.frame %>%
na.exclude %>%
cor
## Cubist GAM Linear.Model Random.Forest
## Cubist 1.0000000 0.7725831 0.7027710 0.7421504
## GAM 0.7725831 1.0000000 0.8880034 0.7831751
## Linear.Model 0.7027710 0.8880034 1.0000000 0.8094299
## Random.Forest 0.7421504 0.7831751 0.8094299 1.0000000
Here is the resulting map:
# Plot result
plot(ens, col = viridis(100), main = "CCC Weighted Average")