The “1, 2, 3” of module building

The process of making a module is essentially:

  1. Write an R function
  2. Run BuildModule() with the function and metadata
  3. Optional – Upload to the zoon modules repository

Each module type is written in a slightly different format, though the same three basic steps apply. Here we show the required inputs and outputs (I/O) for each module while explaining how they fit into a workflow(), and then follow with in-depth examples of how to write each of the module types. We also link to pre-existing modules that you can use as templates.


Module I/O definitions

The default input arguments and return values (outputs) of modules are strict to ensure compatibility with zoon‘s workflow() function. Each module type has its own I/O requirements, however, any module type can have additional named input arguments provided they have default values. A lot of the data frames include’+ covariates’. This indicates that the number of covariate columns is flexible.

The module I/O definitions are as follows:

Occurrence

Occurrence modules have no default inputs and return a data.frame of species observation records in the required format.

In: No default inputs.

Out: A data.frame with column names (in this order):

  • longitude: The longitude of the observation.

  • latitude: The latitude of the observation.

  • value: The response value for the observation when used in a model. This can be 1 or 0 for presence/absence, an integer for abundance (e.g. 1, 3, 67), or a decimal number between 0 and 1 for proportions (e.g. 0.12, 0.5, 0.98).

  • type: This is linked to value and dictates for each row of the data.frame the type of value given. This can be one of the following; 'presence', 'absence', 'background', 'abundance', 'proportion'.

  • fold: Folds are used to test your model. If we have, for example, 3 folds (1, 2, 3) then we can use the PerformanceMeasures output module to test the performance of the model. A common method, implemented by PerformanceMeasures is to build the model using all but one fold, and then test the models ability to predict the fold that was held back.

  • Optionally crs a column specifying the coordinate reference system in proj4string format if the points are not in latitude/longitude (WGS84)

Covariate

Covariate modules have no default inputs and return a Raster___ object containing covariate data.

In: No default inputs.

Out: A RasterLayer, RasterBrick or RasterStack object.

Process

Process modules perform the pre-processing steps of the analysis. Since they modify their inputs instead of generating new outputs their input and output definitions are identical.

In: A list named .data with 2 named elements:

  • df: A data.frame with columns: 'longitude', 'latitude', 'value', 'type', 'fold' plus additional named columns giving associated covariate values. See occurrence module for details on these columns. df has an attribute covCols naming the covariate columns.
  • ras: A RasterLayer, RasterBrick or RasterStack object of covariate rasters.

Out: A list with 2 named elements:

  • df: A data.frame with columns: 'longitude', 'latitude', 'value', 'type', 'fold' plus additional named columns giving associated covariate values. df has an attribute covCols naming the covariate columns.
  • ras: A RasterLayer, RasterBrick or RasterStack object of covariate rasters.

Model

Model modules are where the species distribution model is fit to the data. These modules require a new out type unique to zoon: a ZoonModel object.

In: A data.frame called .df. .df has an attribute covCols naming the covariate columns.

Out: A ZoonModel object (built by the function ZoonModel). This is a list with three elements.

  • model: Your model object (e.g. a glm object).
  • code: A section of code inside { } that will perform predictions using model [your model] and newdata [a new set of covariate data], to return a vector of predicted values, one for each row of newdata.
  • packages: A character vector naming the packages needed to run code.

Output

Output modules are where we can visualise our data or results in some manner. There are set inputs to these modules, but they are open-ended so you can have any output format you desire.

In: There are two inputs to Output modules:

  • .model: A list with 2 named elements:
    • model: A ZoonModel object from a model module
    • data: A data.frame with the columns: 'longitude', 'latitude', 'value', 'type', 'fold', 'predictions', plus additional named columns giving associated covariate values. data has an attribute covCols naming the covariate columns.
  • .ras: A RasterLayer, RasterBrick or RasterStack object.

Out: Anything!


How to write an occurrence module

The aim of an occurrence module is to return a data.frame of occurrence data which can be used for modelling a species’ distribution. The example we’re going to show gets data from a fictional survey we have undertaken. The data was saved as a .csv and to share it we have placed it on Figshare.

# Load zoon

library(zoon)

# Start building our function

Lorem_ipsum_UK <- function(){

In this case we have not given our function any arguments as we simply want to return the online dataset. However you could add arguments here to modify what your function returns (for an example see the SpOcc module).

# First we retrieve the data from figshare
# Here is the URL

URL <- "https://ndownloader.figshare.com/files/2519918"

# Here is the data

out <- read.csv(URL)

head(out)
##    startDate latitude longitude
## 1 2014-06-25 51.98917 0.8917427
## 2 2014-06-25 51.98917 0.8917427
## 3 2007-08-28 52.21136 0.6602159
## 4       <NA> 51.97564 0.9833449
## 5 1973-01-01 52.34187 0.7142953
## 6 2013-04-12 52.23719 0.7877316

Now it is time to think about how we return our data. The output format for occurrence modules is very important. If you do not ensure that the format is correct then your module will not work properly when entered into a workflow. An occurrence module must return a data.frame with the columns 'longitude', 'latitude', 'value', 'type' and 'fold', the details are given in the module I/O definitions. The order of these columns is important. Another optional column is crs (Coordinate Reference System), which specifies the coordinate system of you points if they are not latitude/longitude (the default). This is specified in the proj4string format. You can also supply extra columns that might be used further down the workflow.

Our occurrence data does not have all of these columns so we need to add them. So we need to do a little reformatting

# Keep only Lat Long columns

out <- out[, c("longitude", "latitude")]

# Add in the columns we dont have

out$value <- 1 # all our data are presences
out$type <- 'presence'
out$fold <- 1 # we don't add any folds

# Now the data is in the correct format we can return it

return(out)

We have now written the R code for our occurrence module, this is what it looks like when you put it all together.

Lorem_ipsum_UK <- function(){
  
  # Get data
  
  URL <- "https://ndownloader.figshare.com/files/2519918"
  out <- read.csv(URL)
  out <- out[, c("longitude", "latitude")]
  
  # Add in the columns we dont have
  
  out$value <- 1 # all our data are presences
  out$type <- 'presence'
  out$fold <- 1 # we wont add any folds
  
  return(out)
}

Now that we have our function written we can test it very simply in a workflow like this.

workl1 <- workflow(occurrence = Lorem_ipsum_UK,
                   covariate = UKBioclim,
                   process = OneHundredBackground,
                   model = LogisticRegression,
                   output = PrintMap)
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data

plot of chunk building_occ6

This is a nice way to debug your function and ensure you are getting the results you expect.

Once you are happy that your function is working as you expect it to you can build your code into a module using the BuildModule function in zoon. This script adds in metadata including the type of module, authors’ names, a brief description and documentation for the arguments it accepts (though this one doesn’t accept any arguments). Once BuildModule has created your module it will run it through checks to make sure has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE.

# Let's build our module

BuildModule(Lorem_ipsum_UK,
            type = 'occurrence',
            title = 'A dataset of Lorem ipsum occurrences',
            description = paste0('The module retrieves a dataset of',
                                 'Lorem ipsum records from figshare. This dataset contains',
                                 'precence only data and was collected between 1990 and',
                                 '2000 by members of to Lorem ipsum appreciation society'),
            details = 'This dataset is fake, Lorem ipsum does not exist',
            version = 0.1,
            author = 'A.B. Ceidi',
            email = 'ABCD@anemail.com',
            dataType = 'presence-only')
## Starting checks...
## done
## [1] "Lorem_ipsum_UK"

This function is fairly self explanatory, however, it is worth noting the dataType field. This must be any of 'presence-only', 'presence/absence', 'presence/background', 'abundance' or 'proportion'. This is important so that people using your module in the future will know what it is going to output.

BuildModule has now written an R file in our working directory containing the function and metadata, so that it can be shared with others.

# First we remove the function from our workspace

rm(list = 'Lorem_ipsum_UK')

# This is how you would use a module that a colleague has sent you

LoadModule(module = 'Lorem_ipsum_UK.R')

work2 <- workflow(occurrence = Lorem_ipsum_UK,
                  covariate = UKAir,
                  process = OneHundredBackground,
                  model = LogisticRegression,
                  output = PrintMap)

Once we’re happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.


How to write a covariate module

The aim of a covariate module is to provide spatial information that will help to explain the distribution of a species. For example this data could be climate data, habitat data or topology.

A covariate module, like an occurrence module, does not have to take any arguments but must return a RasterLayer, RasterBrick or RasterStack object.

In this example we will create a covariate module that can provide a number of different climate layers for the area covering Australia.

# Our function will take an argument to set the variable
# the user wants returned

AustraliaAir <- function(variable = 'rhum'){

When your module has arguments, as here, it is important to include defaults for all arguments. This make it easier for other users to use your modules and allows your module to be tested effectively when you upload it to the zoon repository.

The first step is to load the R packages that your code is going to need. It is important that you use the GetPackage() function rather than library() or require() as it will also install the package if the user does not already have it installed.

In this example we do not need any external packages as the data we are downloading is a RasterStack object, and zoon already loads the raster package to deal with RasterStacks.

To share our covariate data we have saved the raster object as an R data file and placed it on Figshare - attributing those that created the data.

In our module we download this data into R:

# Load in the data

URL <- "http://files.figshare.com/2527274/aus_air.rdata"
load(url(URL, method = 'libcurl')) # The object is called 'ras'

# Subset the data according the the variable parameter

ras <- subset(ras, variables)

return(ras)

We can test our function works by running it in a workflow with other modules:

AustraliaAir <- function(variables = 'rhum'){

  URL <- "http://files.figshare.com/2527274/aus_air.rdata"
  load(url(URL, method = 'libcurl')) # The object is called 'ras'
  ras <- subset(ras, variables)
  return(ras)
  
}

# Select the variables we want

myVariables <- c('air','hgt','rhum','shum','omega','uwnd','vwnd')

work3 <- workflow(occurrence = SpOcc(extent = c(111, 157, -46, -6),
                                     species = 'Varanus varius',
                                     limit = 500),
                  covariate = AustraliaAir(variables = myVariables),
                  process = OneHundredBackground,
                  model = LogisticRegression,
                  output = PrintMap)

plot of chunk building_cov4

Once we are happy with the function we have written we need to use the BuildModule() function to convert our function into a module by adding in the necessary metadata. Once BuildModule() has created your module it will run it through checks to make sure it has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE.

# Build our module

BuildModule(AustraliaAir,
            type = 'covariate',
            title = 'Australia Air data from NCEP',
            description = paste('This modules provides access to the',
                                'NCEP air data for austrlia provided by',
                                'NCEP and should be attributed to Climatic',
                                'Research Unit, University of East Anglia'),
            details = paste('These data are redistributed under the terms of',
                            'the Open Database License',
                            'http://opendatacommons.org/licenses/odbl/1.0/'),
            version = 0.1,
            author = 'Z.O. Onn',
            email = 'zoon@zoon-zoon.com',
            paras = list(variables = paste('A character vector of air variables',
                         'you wish to return. This can include any number of',
                         "the following: 'air','hgt','rhum','shum','omega',",
                         "'uwnd','vwnd'")))
## Starting checks...
## done
## [1] "AustraliaAir"

BuildModule() is fairly self explanatory but it is worth noting the paras argument. This takes a named list of the parameters the module takes. This should follow the following structure: list(parameterName = ‘Parameter description.’, anotherParameter = ‘Another description.’).

Once BuildModule() has been run there will be an R file in our working directory that represents our module and can be shared with others. This R script can be used as follows.

# remove the original function from our environment

rm(list = 'AustraliaAir')

# Load the module script

LoadModule('AustraliaAir.R')

work4 <- workflow(occurrence = SpOcc(extent = c(111, 157, -46, -6),
                                     species = 'Varanus varius',
                                     limit = 500),
                  covariate = AustraliaAir,
                  process = OneHundredBackground,
                  model = LogisticRegression,
                  output = PrintMap)

Once we’re happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.


How to write a process module

The aim of a process module is to modify the occurrence data or/and the covariate data prior to modelling. Examples include adding background points, or adding folds for cross-validation.

A process module returns data in exactly the same format that it receives it, taking and returning a list of two elements. The first element is a data.frame with the columns 'longitude', 'latitude', 'value', 'type', 'fold', and additional covariate columns (see Occurrence module output). The covariate columns are added internally in the zoon workflow by combining the output of the covariate module. The data.frame has an attribute covCols which details which columns these are. The second element of the list is a RasterBrick, RasterLayer, or RasterStack object as returned by a covariate module.

In this example we are going to create a process module that cuts down our occurrence data to a user supplied extent.

When writing a module it is useful to have example input to test with. One way to do this is to run a similar workflow and use the outputs of that workflow to test yours. Here is an example:

# We run a very simple workflow so that we can get example input
# for our module

work5 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = OneHundredBackground,
                  model      = LogisticRegression,
                  output     = PrintMap)
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data
## There are fewer than 100 cells in the environmental raster.
## Using all available cells (81) instead

plot of chunk building_pro1

# The output from a process module is in the same format as the 
# input, so we can use the output of OneHundredBackground as the testing
# input for our module. Note that this object should be called .data

.data <- Process(work5)

str(.data, 2)
## List of 2
##  $ df :'data.frame': 269 obs. of  6 variables:
##   ..$ value    : num [1:269] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ type     : Factor w/ 2 levels "background","presence": 2 2 2 2 2 2 2 2 2 2 ...
##   ..$ fold     : num [1:269] 1 1 1 1 1 1 1 1 1 1 ...
##   ..$ longitude: num [1:269] 1.01 -0.16 -2.83 -0.63 -3.53 ...
##   ..$ latitude : num [1:269] 52.4 51.6 53.4 51.6 56 ...
##   ..$ layer    : num [1:269] 271 272 272 272 271 ...
##   ..- attr(*, "call_path")=List of 1
##   ..- attr(*, "covCols")= chr "layer"
##  $ ras:Formal class 'RasterLayer' [package "raster"] with 12 slots
##  - attr(*, "call_path")=List of 3
##   ..$ occurrence: chr "UKAnophelesPlumbeus"
##   ..$ covariate : chr "UKAir"
##   ..$ process   : chr "OneHundredBackground"

It is important to note that the list object that is passed into a process module is named .data, and so when writing our module we need to adhere to this convention.

First, lets have a look at the input:

# The first element is the occurrence data

head(.data$df)
##   value     type fold   longitude latitude    layer
## 1     1 presence    1  1.01287600 52.37696 271.4658
## 2     1 presence    1 -0.16003467 51.57146 272.2655
## 3     1 presence    1 -2.83497900 53.40813 271.6481
## 4     1 presence    1 -0.62955210 51.55540 272.2655
## 5     1 presence    1 -3.52534680 56.04848 271.2964
## 6     1 presence    1  0.01144066 51.58168 272.2655
# The attribute covCols gives the covariate columns

attr(.data$df, 'covCols')
## [1] "layer"
# If we want to modify these covariate columns in
# our process module we can select them using this
# attribute

head(.data$df[attr(.data$df, 'covCols')])
##      layer
## 1 271.4658
## 2 272.2655
## 3 271.6481
## 4 272.2655
## 5 271.2964
## 6 272.2655
# The second element is the raster

plot(.data$ras)

plot of chunk buliding_pro1a

Let’s start writing our new process module.

# Start writing our module

ClipOccurence <- function(.data,
                          extent = c(-180, 180, -180, 180)){

Here we have remembered to give .data as an argument as this is a default for process modules. In addition we have supplied an argument for the extent and set the default to the entire globe (i.e. no clipping). It is important that all of your arguments have defaults (even if the default might not be a good idea in practice), as this allows the zoon system to perform automatic testing on your modules when you share them online.

# Write the body of our function
# extract the occurrence data from the .data object

occDF <- .data$df

# Subset by longitude

occSub <- occDF[occDF$longitude >= extent[1] &
                occDF$longitude <= extent[2], ]

# Subset by latitude

occSub <- occSub[occSub$latitude >= extent[3] &
                 occSub$latitude <= extent[4], ]

# assign this data.frame back to the .data object

.data$df <- occSub

So our simple process function looks like this:

ClipOccurrence <- function(.data, extent = c(-180, 180, -180, 180)){
  
  # Write the body of our function
  # extract the occurrence data from the .data object
  
  occDF <- .data$df
  
  occSub <- occDF[occDF$longitude >= extent[1] &
                  occDF$longitude <= extent[2], ]
 
  occSub <- occSub[occSub$latitude >= extent[3] &
                   occSub$latitude <= extent[4], ]
  
  .data$df <- occSub
  
  return(.data)
  
}

Our next step is to test that this function will work in a workflow. Once we have read in our function so that it is available in our working environment we can then include it in a workflow as we would a normal module.

# Run a workflow with our new process

# In this example we first add background points, then clip the data

work6 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = Chain(OneHundredBackground,
                                     ClipOccurrence(extent = c(-3, 2, 50, 53))),
                  model      = LogisticRegression,
                  output     = PrintMap)

plot of chunk building_pro5

We can see that the data has been clipped to the extent we specified in the map printed by the output module.

The next stage is to turn this function into a module which is shareable. To do this we need to add metadata to our function using the BuildModule() function. Once BuildModule() has created your module it will run it through checks to make sure has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE.

# Build our module

BuildModule(ClipOccurrence,
            type = 'process',
            title = 'Clip occurrence data to extent',
            description = paste('This process module clips the occurrence',
                                'data that is returned from the occurrence',
                                'module to a user defined extent'),
            details = paste('The extent is a square region which denotes the',
                            'area within which observations will be kept.',
                            'All data that falls outside of the extent will',
                            'be removed and will be not be used in the',
                            'modelling process'),
            version = 0.1,
            author = 'Z.O. Onn',
            email = 'zoon@zoon-zoon.com',
            paras = list(extent = paste('A numeric vector of length for',
                                        'giving (in this order) the minimum',
                                        'longitude, maximum longitude, minimum',
                                        'latitude, maximum latitude.')),
            dataType = c('presence-only', 'presence/absence',
                         'presence/background', 'abundance',
                         'proportion'))
## Starting checks...
## done
## [1] "ClipOccurrence"

Much of how to use BuildModule() is self-explanatory but two parameters are worth mentioning here. The paras argument takes a named list of the parameters the module takes. This should follow the following structure; list(parameterName = ‘Parameter description.’, anotherParameter = ‘Another description.’), but should not include the defaults (i.e. we do not include .data). dataType describes the types of occurrence data that this module will work with. Certain modules might only work with presence-only data for example. In our case, our module will work with any type of data and so we list all the data types in the dataType field.

Once BuildModule() has been run there will be an R file in our working directory that represents our module and can be shared with others. This R script can be used as follows:

# remove the original function from our environment

rm(list = 'ClipOccurrence')

# Load the module script

LoadModule('ClipOccurrence.R')
## [1] "ClipOccurrence"
work7 <- workflow(occurrence = AnophelesPlumbeus,
                  covariate = UKBioclim,
                  process = Chain(OneHundredBackground,
                                  ClipOccurrence(extent = c(-5, 5, 50, 55))),
                  model = LogisticRegression,
                  output = PrintMap)
## 152 records found
## 0-
## 152 records downloaded
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data

plot of chunk building_pro7

Once we’re happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.


How to write a model module

Here is a simple function that will become our module. It is a model module that uses generalised additive models. We will work through it one element at a time

First we start our function by declaring all the parameters we need, including all the defaults

GamGam <- function(.df){

Since this is a model module the only default is .df. To find out more about defaults see the section Module IO definitions for module developers. .df is a data.frame with columns: 'longitude', 'latitude', 'value', 'type', 'fold', and additional covariate columns (see Occurrence module output). The names of the covariate columns are given as an attribute of the table: attr(.df, 'covCols')].

Next we specify the packages our function needs. These should be specified by using GetPackage() function in the zoon package. This function will load the package if the user of your module already has it or will install it from CRAN if they don’t. For this reason make sure your package only uses packages that are on CRAN.

# Specify the packages we need using the function
# GetPackage

zoon::GetPackage("gam")

Next we can add the code that does our modelling, here we create a simple GAM (Generalised Additive Model) using the package gam

# Create a data.frame of covariate data

covs <- .df[colnames(.df) %in% attr(.df, 'covCols')]

# do a bit of copy-pasting to define smooth terms for each covariate

f <- sprintf('.df$value ~ s(%s)',
             paste(colnames(covs),
             collapse = ') + s('))

# Run our gam model

m <- gam::gam(formula = formula(f),
              data = covs,
              family = binomial)

The final stage of building a model module is to write some code within the function to create a ZoonModel object. This is important as it standardises all outputs from model modules and crucially enables zoon to make predictions from them in a predictable and standard way.

We build a ZoonModel object by using the function ZoonModel(). This takes three parameters:

  • model: Your model object
  • code: A section of code that will use model [your model] and newdata [a new set of covariate data], to return a vector of predicted values, one for each row of newdata
  • packages: A vector of characters naming the packages needed to run code
# Create a ZoonModel object to return.
# this includes our model, predict method
# and the packages we need.

ZoonModel(model = m,
          code = {
          
          # create empty vector of predictions

          p <- rep(NA, nrow(newdata))
          
          # omit NAs in new data

          newdata_clean <- na.omit(newdata)
          
          # get NA indices

          na_idx <- attr(newdata_clean, 'na.action')
          
          # if there are no NAs then the index should 
          # include all rows, else it should name the 
          # rows to ignore

          if (is.null(na_idx)){
            idx <- 1:nrow(newdata)
          } else {
            idx <- -na_idx
          }
          
          # Use the predict function in gam to predict
          # our new values

          p[idx] <- gam::predict.gam(model,
                                     newdata_clean,
                                     type = 'response')
          return (p)
        },
        packages = 'gam')

With all these elements in place we now have our module complete. All together it looks like this:

GamGam <- function(.df){

  # Specify the packages we need using the function
  # GetPackage

  zoon::GetPackage("gam")
  
  # Create a data.frame of covariate data

  covs <- .df[colnames(.df) %in% attr(.df, 'covCols')]

  # do a bit of copy-pasting to define smooth terms for each covariate

  f <- sprintf('.df$value ~ s(%s)',
               paste(colnames(covs),
                     collapse = ') + s('))
  
  # Run our gam model

  m <- gam::gam(formula = formula(f),
                data = covs,
                family = binomial)
  
  # Create a ZoonModel object to return.
  # this includes our model, predict method
  # and the packages we need.

  ZoonModel(model = m,
            code = {
            
            # create empty vector of predictions

            p <- rep(NA, nrow(newdata))
            
            # omit NAs in new data

            newdata_clean <- na.omit(newdata)
            
            # get their indices

            na_idx <- attr(newdata_clean, 'na.action')
            
            # if there are no NAs then the index should 
            # include all rows, else it should name the 
            # rows to ignore

            if (is.null(na_idx)){
              idx <- 1:nrow(newdata)
            } else {
              idx <- -na_idx
            }
            
            # Use the predict function in gam to predict
            # our new values

            p[idx] <- gam::predict.gam(model,
                                       newdata_clean,
                                       type = 'response')
            return (p)
          },
          packages = 'gam')
  
}

We then run BuildModule() on our function, adding the required metadata. As this module has no parameters other than .df which is not user specified, we don’t need to set the paras argument, which would normally be used to document arguments. Default arguments, like .df are all signified by starting with a . and don’t need to be documented as this will be written into the module documentation automatically. It is worth noting the dataType field. This must be any of 'presence-only', 'presence/absence', 'presence/background', 'abundance' or 'proportion'. This is important so that people using your module in the future will know what types of data can be used as inputs. Once BuildModule() has created your module it will run it through checks to make sure has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE.

BuildModule(object = GamGam,
            type = 'model',
            title = 'GAM sdm model',
            description = 'This is my mega cool new model.',
            details = paste('This module performs GAMs (Generalised Additive',
                            'Models) using the gam function from the package gam.'),
            version = 0.1,
            author = 'Z. Oon',
            email = 'zoon@zoon.com',
            dataType = c('presence-only', 'presence/absence'))
## Starting checks...
## done
## [1] "GamGam"

This is now a runable module.

# remove the function in our workspace else
# this will cause problems

rm(GamGam)

# Load in the module we just built

LoadModule('GamGam.R')
## [1] "GamGam"
# Run a workflow using our module

work8 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate = UKAir,
                  process  = OneHundredBackground,
                  model = GamGam,
                  output = PrintMap)

plot of chunk building_mod7

Once we’re happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.


How to write an output module

An output module is the last module in a zoon workflow and is an opportunity to summarise the model results, make predictions, or otherwise visualise the data or results. The input to output modules is a combination of the outputs of occurrence, process and model modules providing many possible output types. The specific inputs are .model, a list with the two named elements model (a ZoonModel object) and data (data.frame with the columns: 'longitude', 'latitude', 'value', 'type', 'fold', 'predictions', plus additional named columns giving associated covariate values), and .ras, a RasterLayer, RasterBrick or RasterStack object. Depending on your desired output you may not utilise some of these inputs.

In this example we will create an output module that uses the model output to predict the species occurrence in a new location given by a user-provided raster.

When writing a module it is useful to have example input to test with. One way to do this is to run a similar workflow and use the outputs of that workflow to test yours. Here is an example:

# We run a very simple workflow so that we can get example input
# for our module

work9 <- workflow(occurrence = UKAnophelesPlumbeus,
                  covariate  = UKAir,
                  process    = OneHundredBackground,
                  model      = LogisticRegression,
                  output     = PrintMap)

# The input to an output module is a combination of the output
# from the model module and the covariate module. We can recreate
# it for this work flow like this

.model <- Model(work9)
.ras <- Covariate(work9)

Both .model and .ras are default arguments for an output model so it is important that you have them as arguements for your module, even if you dont use them both. It is also important that you stick to the same naming conventions.

# Our output module takes the default parameters and a user-defined
# Raster* object that has the same structure as the raster layer output
# by the covariate module

PredictNewRasterMap <- function(.model, .ras, raster = .ras){

It is important to have default values for all user defined parameters so that your module can be tested when you upload it to the zoon website. Here we set our default ‘new area’ raster to be the same as the raster used to create the model. Clearly this is not how we envisage the module being used in a real application (unless they genuinely wanted to predict back to the same area), however this ensures that this module will always work with its default arguments, no matter what workflow it is placed in.

# The first step is to load in the packages we need

zoon::GetPackage("raster") 
  
# Then extract the covariate values
# from the user provided raster

vals <- data.frame(getValues(raster))
colnames(vals) <- names(raster)

Once we have these new values we can predict using the ZoonPredict() function. This function is very useful as it simplifies the process of making predictions from the ouput of a model module. See the InteractiveMap module for an innovative visualisation using predicted values.

# Make predictions to the new values

pred <- ZoonPredict(.model$model,
                    newdata = vals)

# Create a copy of the users' raster...
# (just a single layer)

pred_ras <- raster[[1]]
    
# ... and assign the predicted values to it

pred_ras <- setValues(pred_ras, pred)

Once we have the raster of predicted values we can plot it and return the results to the user.

# Plot the predictions as a map

plot(pred_ras)

# Return the raster of predictions

return (pred_ras)

Our function now looks like this:

PredictNewRasterMap <- function(.model, .ras, raster = .ras){
  
  zoon::GetPackage("raster")
  
  # Extract the values from the user provided raster
  
  vals <- data.frame(getValues(raster))
  colnames(vals) <- names(raster)
  
  # Make predictions to the new values
  
  pred <- ZoonPredict(.model$model,
                      newdata = vals)
  
  pred_ras <- raster[[1]]
  pred_ras <- setValues(pred_ras, pred)
  
  # Print the predictions as a map
  
  plot(pred_ras)
  
  return(pred_ras)
}

Our next step is to test that this function will work in a workflow. Once we have read in our function so that it is available in our working environment we can then include it in a workflow as we would a normal module.

# Run it with the defaults

work10 <- workflow(occurrence = UKAnophelesPlumbeus,
                   covariate  = UKBioclim,
                   process    = OneHundredBackground,
                   model      = LogisticRegression,
                   output     = PredictNewRasterMap)

plot of chunk building_out7

# Now I'm going to run it with a different raster

library(raster)

# Get Bioclim data (using the getData function in the raster package,
# which zoon loads) ...

BioclimData <- getData('worldclim', var = 'bio', res = 5)
BioclimData <- BioclimData[[1:19]]

# ... and crop to Australia

cropped <- crop(BioclimData,
                c(109,155,-46,-7))

# Run it with my new raster

work11 <- workflow(occurrence = UKAnophelesPlumbeus,
                   covariate  = UKBioclim,
                   process    = OneHundredBackground,
                   model      = LogisticRegression,
                   output     = PredictNewRasterMap(raster = cropped))

plot of chunk building_out7

# The prediction map should also be returned as a raster

class(Output(work11))
## [1] "RasterLayer"
## attr(,"package")
## [1] "raster"

The next stage is to turn this function into a module which is shareable. To do this we need to add metadata to our function using the BuildModule() function

# Build our module

BuildModule(PredictNewRasterMap,
            type = 'output',
            title = 'Predict to a new raster and map',
            description = paste('This output module predicts the species',
                                'distribution in a new area given a new',
                                'raster'),
            details = paste('The results are printed as a map and a raster is',
                            'returned with the predicted values. It is important',
                            'that the new raster has the same structure as the',
                            'raster provided by the covariate module.',
                            'It must have the same covariate columns in the',
                            'same order.'),
            version = 0.1,
            author = 'Z.O. On',
            email = 'zoon@zoon-zoon.com',
            paras = list(raster = paste('A RasterBrick, RasterLayer or RasterStack in',
                                        'the same format as the raster provided',
                                        'by the covariate module. Predicted values',
                                        'will be estimated for this raster using',
                                        'the results from the model module')),
            dataType = c('presence-only', 'presence/absence', 'abundance',
                         'proportion'))
## Starting checks...
## done
## [1] "PredictNewRasterMap"

Once BuildModule() has created your module it will run it through checks to make sure has the required features, outputs data in the correct format, etc. Checking can be turned off by setting the argument check = FALSE. Much of how to use BuildModule() is self-explanatory but two parameters are worth mentioning here. The paras argument takes a named list of the parameters the module takes in the following structure: list(parameterName = ‘Parameter description.’, anotherParameter = ‘Another description.’), but should not include the defaults (i.e. we do not include .model or .ras). dataType describes the types of occurrence data that this module will work with. Certain modules might only work with presence-only data for example. In our case, our module will work with any type of data and so we list all the data types in the dataType field.

Once BuildModule() has been run there will be an R file in our working directory that represents our module and can be shared with others. This R script can be used as follows.

# remove the original function from our environment

rm(list = 'PredictNewRasterMap')

# Load the module script

LoadModule('PredictNewRasterMap.R')
## [1] "PredictNewRasterMap"
# Now I model a crop pest from Zimbabwe in its home
# range and in Australia by chaining together
# output modules

work12 <- workflow(occurrence = CWBZimbabwe,
                   covariate = Bioclim(extent = c(28, 38, -24, -16)),
                   process = NoProcess,
                   model = RandomForest,
                   output = Chain(PrintMap,
                                  PredictNewRasterMap(raster = cropped)))
## Occurrence data does not have a "crs" column, zoon will assume it is in the same projection as the covariate data

plot of chunk building_out9plot of chunk building_out9

Once we’re happy with the module, we will hopefully upload it to the zoon repository. The repository is currently under development. Visit the development pages for more information.