incidence2 is an R package that implements functions and classes to compute, handle and visualise incidence from linelist data.
The main features of the package include:
incidence()
and build_incidence()
functions compute incidence from both linelist and pre-aggregated datasets across a range of date groupings. The returned object from incidence()
is a subclass of tibble. This is compatible with dplyr for data manipulation (see vignette("handling_incidence_objects")
for more details).plot.incidence2()
and facet_plot.incidence2()
that provide quick plots with sensible defaults.regroup()
: regroup incidence from different groups into one global incidence time series.keep_first()
and keep_last()
: will keep the rows corresponding to the first (or last) set of grouped dates (ordered by time) from an incidence()
object.complete_counts()
: ensure every possible combination of date and groupings is given an explicit count.print.incidence_df()
** and summary.incidence_df()
methods.as.data.frame.incidence_df()
and as_tibble.incidence_df()
conversion methods.get_count_names()
, get_dates_name()
, get_date_index()
, get_group_names()
, get_interval()
, get_timespan()
and get_n()
.This example uses the simulated Ebola Virus Disease (EVD) outbreak from the package outbreaks. We will compute incidence for various time steps and illustrate how to easily plot the data.
library(outbreaks)
library(incidence2)
<- ebola_sim_clean$linelist
dat class(dat)
#> [1] "data.frame"
str(dat)
#> 'data.frame': 5829 obs. of 11 variables:
#> $ case_id : chr "d1fafd" "53371b" "f5c3d8" "6c286a" ...
#> $ generation : int 0 1 1 2 2 0 3 3 2 3 ...
#> $ date_of_infection : Date, format: NA "2014-04-09" ...
#> $ date_of_onset : Date, format: "2014-04-07" "2014-04-15" ...
#> $ date_of_hospitalisation: Date, format: "2014-04-17" "2014-04-20" ...
#> $ date_of_outcome : Date, format: "2014-04-19" NA ...
#> $ outcome : Factor w/ 2 levels "Death","Recover": NA NA 2 1 2 NA 2 1 2 1 ...
#> $ gender : Factor w/ 2 levels "f","m": 1 2 1 1 1 1 1 1 2 2 ...
#> $ hospital : Factor w/ 5 levels "Connaught Hospital",..: 2 1 3 NA 3 NA 1 4 3 5 ...
#> $ lon : num -13.2 -13.2 -13.2 -13.2 -13.2 ...
#> $ lat : num 8.47 8.46 8.48 8.46 8.45 ...
To compute daily incidence we must pass observation data in the form of a data.frame to incidence()
. We must also pass the name of a date variable in the data that we can use to index the input:
First compute the daily incidence:
<- incidence(dat, date_index = date_of_onset)
daily
daily#> An incidence object: 367 x 2
#> date range: [2014-04-07] to [2015-04-30]
#> cases: 5829
#> interval: 1 day
#> cumulative: FALSE
#>
#> date_index count
#> <date> <int>
#> 1 2014-04-07 1
#> 2 2014-04-15 1
#> 3 2014-04-21 2
#> 4 2014-04-25 1
#> 5 2014-04-26 1
#> 6 2014-04-27 1
#> 7 2014-05-01 2
#> 8 2014-05-03 1
#> 9 2014-05-04 1
#> 10 2014-05-05 1
#> # … with 357 more rows
summary(daily)
#> date range: [2014-04-07] to [2015-04-30]
#> cases: 5829
#> interval: 1 day
#> cumulative: FALSE
#> timespan: 389 days
plot(daily)
The daily incidence is quite noisy, but we can easily compute other incidence using other time intervals
# 7 day incidence
<- incidence(dat, date_index = date_of_onset, interval = 7)
seven
seven#> An incidence object: 56 x 2
#> date range: [2014-04-07 to 2014-04-13] to [2015-04-27 to 2015-05-03]
#> cases: 5829
#> interval: 7 days
#> cumulative: FALSE
#>
#> date_index count
#> <period> <int>
#> 1 2014-04-07 to 2014-04-13 1
#> 2 2014-04-14 to 2014-04-20 1
#> 3 2014-04-21 to 2014-04-27 5
#> 4 2014-04-28 to 2014-05-04 4
#> 5 2014-05-05 to 2014-05-11 12
#> 6 2014-05-12 to 2014-05-18 17
#> 7 2014-05-19 to 2014-05-25 15
#> 8 2014-05-26 to 2014-06-01 19
#> 9 2014-06-02 to 2014-06-08 23
#> 10 2014-06-09 to 2014-06-15 21
#> # … with 46 more rows
plot(seven, color = "white")
Notice how specifying the interval as 7 creates weekly intervals with the coverage displayed by date. Below we illustrate how incidence()
also allows us to create year-weekly groupings with the default being weeks starting on a Monday (following the ISO 8601 date and time standard).
# year-weekly, starting on Monday (ISO week, default)
<- incidence(dat, date_index = date_of_onset, interval = "week")
weekly plot(weekly, color = "white")
incidence()
will also work with larger intervals
# bi-weekly, based on first day in data
<- incidence(dat, date_index = date_of_onset, interval = "2 weeks")
biweekly plot(biweekly, color = "white")
# monthly
<- incidence(dat, date_index = date_of_onset, interval = "month")
monthly plot(monthly, color = "white")
# quarterly
<- incidence(dat, date_index = date_of_onset, interval = "quarter")
quarterly plot(quarterly, color = "white")
# year
<- incidence(dat, date_index = date_of_onset, interval = "year")
yearly plot(yearly, color = "white", n_breaks = 2)
incidence()
can also aggregate incidence by specified groups using the groups
argument. For instance, we can compute incidence by gender and plot with both the plot.incidence_df()
function for a single or the facet_plot.incidence_df()
function for a multi-faceted plot across groups:
<- incidence(dat, date_of_onset, interval = "week", groups = gender)
weekly_grouped
weekly_grouped#> An incidence object: 109 x 3
#> date range: [2014-W15] to [2015-W18]
#> cases: 5829
#> interval: 1 (Monday) week
#> cumulative: FALSE
#>
#> date_index gender count
#> <yrwk> <fct> <int>
#> 1 2014-W15 f 1
#> 2 2014-W16 m 1
#> 3 2014-W17 f 4
#> 4 2014-W17 m 1
#> 5 2014-W18 f 4
#> 6 2014-W19 f 9
#> 7 2014-W19 m 3
#> 8 2014-W20 f 7
#> 9 2014-W20 m 10
#> 10 2014-W21 f 8
#> # … with 99 more rows
summary(weekly_grouped)
#> date range: [2014-W15] to [2015-W18]
#> cases: 5829
#> interval: 1 (Monday) week
#> cumulative: FALSE
#> timespan: 392 days
#>
#> 1 grouped variable
#>
#> gender count
#> <fct> <int>
#> 1 f 2934
#> 2 m 2895
# A singular plot
plot(weekly_grouped, fill = gender, color = "white")
# a multi-facet plot
facet_plot(weekly_grouped, fill = gender, n_breaks = 5, angle = 45, color = "white")
There is no limit to the number of groups that we group by and this allows us to both facet and fill by different variables:
<- incidence(dat, date_of_onset, interval = "week", groups = c(outcome, hospital))
inci facet_plot(inci, facets = hospital, fill = outcome, nrow = 3, n_breaks = 5, angle = 45)