library(sfdep)
library(dplyr)
sfdep introduces a new s3 class to represent spatio-temporal data.
The spacetime
class links a flat data set containing
spatio-temporal information with the related geometry. The spacetime
class is informed by the {spacetime}
package by Edzer
Pebesma (2012), and the interface is inspired by the design of
{tidygraph}
.
Traditionally “spatio-temporal data often come in the form of single tables” that can typically be categorized as “time-wide”, “space-wide”, or “long formats.” In long formats, often referred to as “tidy”, a row identifies a unique location and time observation represented by a column dedicated to time and another to locations. This is the typical presentation of panel data.
Space-wide data present each time period across each row and locational information in each column. Whereas a time-wide representation has location data down the rows and each time period is represented as a new column.
These flat formats are not linked to the geographies that they represent in any meaningful way. These flat files typically contain only an identifier of the location, but the spatial representation.
The spacetime
class is developed with particular focus
to lattice data. That is, to create a representation of spatio-temporal
data for a set of regions over a number of different time-periods
e.g. population densities in census tracts for each year.
To represent spatial data in a temporal context Pebesma, 2012 identifies a number of spatio-temporal layouts, two of which are of particular interest. These are the spatio-temporal full grid and sparse grids.
Given a number of spatial features \(n\), and time periods \(m\), a spatio-temporal full grid contains \(n \times m\) rows. Each location has a recorded observation for each of the time periods in \(m\). For example, if there are 10 locations and 20 time periods, there are 20 observations per location meaning there are \(10 \times 20 = 200\) observations. This is efficient only when are there are complete time-series for each location.
When there are missing observations for some locations or time periods and they are entirely omitted from the data set, that is a spatio-temporal sparse grid. In this case \(N \lt m \times n\)
Inspired by the design of the tidygraph package, the spacetime class links a data frame and an sf object based on a shared location identifier column. These are referred to as the data context and the geometry context. The spacetime class allows you switch between different contexts and work with them individually as you see fit.
Typically, if one wants to represent location data over multiple time periods containing information about the geography, an sf object is used which duplicates the geometry at each location for each time period which can be computationally expensive. By linking sf objects to a data frame based on their location ID, we are able to avoid this problem
There are four important aspects to the spacetime class:
data.frame
objectsf
objectThere are two ways to create spacetime objects: 1) with
as_spacetime()
and 2) spacetime()
or
new_spacetime()
. The former takes an sf object that
contains the location IDs, times, and geometry and converts it into a
spacetime object. Whereas the constructor functions require a data frame
and a separate sf object containing the geometry.
Let’s create a sample data set using the guerry
# replicate the guerry dataset 10 times
<- purrr::map_dfr(1:10, ~guerry) |>
x select(code_dept, crime_pers) |>
# create an indicator for time period
mutate(time_period = sort(rep(1:10, 85)),
# add some noise
crime_pers = crime_pers * runif(850, max = 2))
x#> Simple feature collection with 850 features and 3 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 47680 ymin: 1703258 xmax: 1031401 ymax: 2677441
#> CRS: NA
#> # A tibble: 850 × 4
#> code_dept crime_pers geometry time_…¹
#> * <fct> <dbl> <MULTIPOLYGON> <int>
#> 1 01 3873. (((801150 2092615, 800669 2093190, 800688 20954… 1
#> 2 02 7906. (((729326 2521619, 729320 2521230, 729280 25185… 1
#> 3 03 20004. (((710830 2137350, 711746 2136617, 712430 21352… 1
#> 4 04 11077. (((882701 1920024, 882408 1920733, 881778 19212… 1
#> 5 05 11280. (((886504 1922890, 885733 1922978, 885479 19232… 1
#> 6 07 17253. (((747008 1925789, 746630 1925762, 745723 19251… 1
#> 7 08 55127. (((818893 2514767, 818614 2514515, 817900 25144… 1
#> 8 09 7479. (((509103 1747787, 508820 1747513, 508154 17470… 1
#> 9 10 1862. (((775400 2345600, 775068 2345397, 773587 23451… 1
#> 10 11 15950. (((626230 1810121, 626269 1810496, 627494 18113… 1
#> # … with 840 more rows, and abbreviated variable name ¹time_period
#> # ℹ Use `print(n = ...)` to see more rows
This representation, where there are duplicate geometries for each
location, should be cast into a spacetime object using
as_spacetime()
.
<- as_spacetime(x, "code_dept", "time_period") spt
Alternatively, we have the other scenario, where we have the geometry
and the data as two separate objects. In this case we can use the
spacetime()
constructor. It’s required arguments are
.data
, .geometry
, .loc_col
,
.time_col
. .data
must be a data frame and
.geometry
must be a tibble.
Here we create a data frame df
which contains columns
for the location identifier, the time period, and any other variables of
interest in this case crime_pers
.
<- sf::st_drop_geometry(x)
df <- select(guerry, code_dept)
geo
head(df)
#> # A tibble: 6 × 3
#> code_dept crime_pers time_period
#> <fct> <dbl> <int>
#> 1 01 3873. 1
#> 2 02 7906. 1
#> 3 03 20004. 1
#> 4 04 11077. 1
#> 5 05 11280. 1
#> 6 07 17253. 1
Note that the location identifier column is the same between the two objects—this is a requirement.
<- spacetime(
spt .data = df,
.geometry = geo,
.loc_col = "code_dept",
.time_col = "time_period"
)
spt#> spacetime ────
#> Context:`data`
#> 85 locations `code_dept`
#> 10 time periods `time_period`
#> ── data context ────────────────────────────────────────────────────────────────
#> # A tibble: 850 × 3
#> code_dept crime_pers time_period
#> * <fct> <dbl> <int>
#> 1 01 3873. 1
#> 2 02 7906. 1
#> 3 03 20004. 1
#> 4 04 11077. 1
#> 5 05 11280. 1
#> 6 07 17253. 1
#> 7 08 55127. 1
#> 8 09 7479. 1
#> 9 10 1862. 1
#> 10 11 15950. 1
#> # … with 840 more rows
#> # ℹ Use `print(n = ...)` to see more rows
As an aside, I’d note that
as_spacetime()
uses the sf distinct method which can be a bit computationally intense depending on your geometries. As such I’d recommend usingspacetime()
constructor always.
With the spacetime objects, we can also cast them back into sf
objects using as_sf(x)
.
Spacetime objects have two contexts: the data and geometry contexts.
The data context consists of a data frame object. It
can be manipulated just like any other data frame. You switch between
contexts using activate()
. To switch to the data context
activate “data.”
activate(spt, "data")
#> spacetime ────
#> Context:`data`
#> 85 locations `code_dept`
#> 10 time periods `time_period`
#> ── data context ────────────────────────────────────────────────────────────────
#> # A tibble: 850 × 3
#> code_dept crime_pers time_period
#> * <fct> <dbl> <int>
#> 1 01 3873. 1
#> 2 02 7906. 1
#> 3 03 20004. 1
#> 4 04 11077. 1
#> 5 05 11280. 1
#> 6 07 17253. 1
#> 7 08 55127. 1
#> 8 09 7479. 1
#> 9 10 1862. 1
#> 10 11 15950. 1
#> # … with 840 more rows
#> # ℹ Use `print(n = ...)` to see more rows
The geometry context is an sf object that too can be
used like any other sf object and is activated with
activate(x, "geometry")
.
|>
spt activate("geometry")
#> spacetime ────
#> Context:`geometry`
#> 85 locations `code_dept`
#> 10 time periods `time_period`
#> ── geometry context ────────────────────────────────────────────────────────────
#> Simple feature collection with 85 features and 1 field
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 47680 ymin: 1703258 xmax: 1031401 ymax: 2677441
#> CRS: NA
#> # A tibble: 85 × 2
#> code_dept geometry
#> * <fct> <MULTIPOLYGON>
#> 1 01 (((801150 2092615, 800669 2093190, 800688 2095430, 800780 2095795,…
#> 2 02 (((729326 2521619, 729320 2521230, 729280 2518544, 728751 2517520,…
#> 3 03 (((710830 2137350, 711746 2136617, 712430 2135212, 712070 2134132,…
#> 4 04 (((882701 1920024, 882408 1920733, 881778 1921200, 881526 1922332,…
#> 5 05 (((886504 1922890, 885733 1922978, 885479 1923276, 883061 1925266,…
#> 6 07 (((747008 1925789, 746630 1925762, 745723 1925138, 744216 1925236,…
#> 7 08 (((818893 2514767, 818614 2514515, 817900 2514467, 817327 2514945,…
#> 8 09 (((509103 1747787, 508820 1747513, 508154 1747093, 505861 1746627,…
#> 9 10 (((775400 2345600, 775068 2345397, 773587 2345177, 772940 2344780,…
#> 10 11 (((626230 1810121, 626269 1810496, 627494 1811321, 627681 1812424,…
#> # … with 75 more rows
#> # ℹ Use `print(n = ...)` to see more rows
Unlike {spacetime}
, sfdep does not make explicit
distinctions between spatio-temporal full and sparse grids. Rather, the
approach is more laissez faire. The design of the spacetime interface is
very flexible and is designed to let the user clean their data with
whatever tools are familiar and to their own specification.
The distinction between sparse and full grids is important when it comes to analyzing data. For example emerging hot spot analysis requires a spatio-temporal full-grid. sfdep utilizes the phrase “spacetime cube” as popularized by ESRI to refer to a spatio-temporal full grid.
A spacetime object is a spacetime cube if every location has a value for every time index. Another way of saying this is that each location contains a regular time-series.
In ESRI terminology, the basic unit of a spacetime cube is a bin. A bin is the unique combination of a location and time index. For each time index, the collection of every location is called a time slice. In every location, the collection of every bin at each time index is referred to as a a bin time-series.
We can test if an object is a spacetime cube with
is_spacetime_cube()
is_spacetime_cube(spt)
#> [1] TRUE
Here we take a sample of 800 of the 850 rows of spt
which makes this a sparse grid.
<- dplyr::slice_sample(spt, n = 800)
sparse_spt
is_spacetime_cube(sparse_spt)
#> ! Number of rows does not equal `n time-periods x n locations`
#> [1] FALSE
If an object is a spare spatio-temporal grid we can make it a full
one using complete_spacetime_cube()
. This works similarly
to [tidyr::complete()
].
complete_spacetime_cube()
ensures that there is a row for
each combination of location and time. New rows will contain missing
values
<- complete_spacetime_cube(sparse_spt)
spt_complete #> ! Vars(s) `crime_pers` is missing 50 value(s).
is_spacetime_cube(spt_complete)
#> [1] TRUE
One of the conditions of being a spactime cube is that the time-series must be regular (only one observation for each time index). Here we can create a sample of our data with replacement to create an irregular time-series at multiple locations.
set.seed(0)
<- dplyr::slice_sample(spt, n = 800, replace = TRUE)
sparse_spt
complete_spacetime_cube(sparse_spt)
#> Error in `complete_spacetime_cube()`:
#> ! Location and time combinations are not unique.
#> ℹ There should only be one observation per time and location combination.
This error is informative. We do not have unique bins in our spacetime data. We can check this.
::count(sparse_spt, time_period, code_dept)
dplyr#> spacetime ────
#> Context:`data`
#> 85 locations `code_dept`
#> 10 time periods `time_period`
#> ── data context ────────────────────────────────────────────────────────────────
#> # A tibble: 517 × 3
#> time_period code_dept n
#> <int> <fct> <int>
#> 1 1 01 1
#> 2 1 03 1
#> 3 1 05 1
#> 4 1 07 1
#> 5 1 09 1
#> 6 1 12 1
#> 7 1 16 1
#> 8 1 17 2
#> 9 1 18 1
#> 10 1 21 2
#> # … with 507 more rows
#> # ℹ Use `print(n = ...)` to see more rows
Spacetime cubes are used for emerging hot spot analysis as below.
emerging_hotspot_analysis(spt, "crime_pers", threshold = 0.05)
#> # A tibble: 85 × 4
#> location tau p_value classification
#> <fct> <dbl> <dbl> <chr>
#> 1 01 0.111 0.721 sporadic hotspot
#> 2 02 0.378 0.152 sporadic hotspot
#> 3 03 0.378 0.152 sporadic hotspot
#> 4 04 -0.0667 0.858 sporadic coldspot
#> 5 05 0.156 0.592 sporadic coldspot
#> 6 07 -0.467 0.0736 sporadic coldspot
#> 7 08 -0.0222 1 sporadic hotspot
#> 8 09 -0.333 0.210 sporadic coldspot
#> 9 10 -0.422 0.107 no pattern detected
#> 10 11 -0.333 0.210 consecutive coldspot
#> # … with 75 more rows
#> # ℹ Use `print(n = ...)` to see more rows