Global Historical Climatology Network (GHCN) provides daily climate
measures from stations across the world. prcp_aus
extracts
daily precipitation and minimum temperature from GHCN for 639 stations
in Australia from 2016 to 2020. This is where these stations locate in
an Australia map:
<- rmapshaper::ms_simplify(ozmaps::abs_ste, keep = 2e-3)
state_map ggplot() +
geom_sf(data = state_map, inherit.aes = FALSE,
color = "grey80", alpha = 0.4, linetype = 3) +
geom_point(data = prcp_aus , aes(x = long, y = lat)) +
theme_void()
This is a lot of stations to look at for one climate variable and they can’t all fit into a glyph map. What we can do is to group stations into clusters and look at the aggregated series in the glyph map. In this vignette, I will introduce how to perform aggregation using a hierarchical data structure in cubble.
First, let’s summarise the daily data into weekly sum for each station:
<- prcp_aus %>%
station_long face_temporal(ts) %>%
mutate(wk = lubridate::week(date)) %>%
group_by(wk) %>%
summarise(prcp = sum(prcp, na.rm = TRUE))
station_long#> # cubble: wk, id [639]: long form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
#> id wk prcp
#> <chr> <dbl> <dbl>
#> 1 ASN00001006 1 3660
#> 2 ASN00001006 2 1766
#> 3 ASN00001006 3 1748
#> 4 ASN00001006 4 4614
#> 5 ASN00001006 5 1080
#> 6 ASN00001006 6 1868
#> 7 ASN00001006 7 2494
#> 8 ASN00001006 8 1696
#> 9 ASN00001006 9 3124
#> 10 ASN00001006 10 1834
#> # … with 33,814 more rows
First we need to assign each station a cluster number. Here we use a
simple kmean algorithm based on the distance matrix to create 20
clusters. This creates station_nested
as a station level
nested cubble with a cluster column indicating the group each station
belongs to.
station_nested#> # cubble: id [639]: nested form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # temporal: wk [dbl], prcp [dbl]
#> id lat long elev name wmo_id ts cluster
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <list> <int>
#> 1 ASN00001006 -15.5 128. 3.8 wyndham aero 95214 <tibble> 3
#> 2 ASN00001007 -13.8 126. 6 troughton island 94102 <tibble> 3
#> 3 ASN00001018 -16.4 126. 546 mount elizabeth 94211 <tibble> 16
#> 4 ASN00001019 -14.3 127. 23 kalumburu 94100 <tibble> 3
#> 5 ASN00001020 -14.1 126. 51 truscott 95101 <tibble> 3
#> 6 ASN00001025 -15.4 126. 385 doongan 94215 <tibble> 3
#> 7 ASN00002012 -18.2 128. 422 halls creek airport 94212 <tibble> 17
#> 8 ASN00002032 -17.0 128. 203 warmun 94213 <tibble> 3
#> 9 ASN00002056 -15.8 129. 44 kununurra aero 94216 <tibble> 3
#> 10 ASN00002064 -16.6 128. 164 argyle aerodrome 94217 <tibble> 3
#> # … with 629 more rows
To create a group level cubble, use switch_key()
with
the new key variable, cluster
:
<- station_nested %>% switch_key(cluster)
cluster_nested %>% head(5)
cluster_nested #> # cubble: cluster [5]: nested form
#> # bbox: [113.53, -35.03, 136.83, -11.04]
#> # temporal: id [chr], wk [dbl], prcp [dbl]
#> cluster .val ts
#> <int> <list> <list>
#> 1 3 <tibble [30 × 6]> <tibble [1,590 × 3]>
#> 2 16 <tibble [13 × 6]> <tibble [689 × 3]>
#> 3 17 <tibble [15 × 6]> <tibble [795 × 3]>
#> 4 12 <tibble [16 × 6]> <tibble [848 × 3]>
#> 5 20 <tibble [61 × 6]> <tibble [3,233 × 3]>
The resulted cluster_nested
now has cluster
as the key and all the station level time invariant variables are nested
inside .val
. Currently, there is no cluster level time
invariant variables and we can add the centroid of each cluster by
get_centroid()
:
<- cluster_nested %>% get_centroid()
cluster_nested
cluster_nested#> # cubble: cluster [20]: nested form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # temporal: id [chr], wk [dbl], prcp [dbl]
#> cluster .val ts hull cent_long cent_lat
#> <int> <list> <list> <list> <dbl> <dbl>
#> 1 3 <tibble [30 × 6]> <tibble [1,590 × 3]> <tibble> 131. -13.7
#> 2 16 <tibble [13 × 6]> <tibble [689 × 3]> <tibble> 122. -19.5
#> 3 17 <tibble [15 × 6]> <tibble [795 × 3]> <tibble> 133. -17.5
#> 4 12 <tibble [16 × 6]> <tibble [848 × 3]> <tibble> 115. -24.2
#> 5 20 <tibble [61 × 6]> <tibble [3,233 × 3]> <tibble> 117. -31.2
#> 6 4 <tibble [18 × 6]> <tibble [954 × 3]> <tibble> 121. -30.6
#> 7 6 <tibble [16 × 6]> <tibble [848 × 3]> <tibble> 130. -27.3
#> 8 13 <tibble [17 × 6]> <tibble [901 × 3]> <tibble> 140. -21.1
#> 9 10 <tibble [20 × 6]> <tibble [1,060 × 3]> <tibble> 137. -29.8
#> 10 19 <tibble [47 × 6]> <tibble [2,491 × 3]> <tibble> 138. -34.2
#> 11 5 <tibble [36 × 6]> <tibble [1,908 × 3]> <tibble> 142. -36.1
#> 12 18 <tibble [10 × 6]> <tibble [530 × 3]> <tibble> 143. -13.6
#> 13 9 <tibble [22 × 6]> <tibble [1,166 × 3]> <tibble> 146. -19.3
#> 14 2 <tibble [28 × 6]> <tibble [1,484 × 3]> <tibble> 149. -24.5
#> 15 14 <tibble [54 × 6]> <tibble [2,845 × 3]> <tibble> 151. -28.4
#> 16 15 <tibble [17 × 6]> <tibble [901 × 3]> <tibble> 144. -29.9
#> 17 1 <tibble [62 × 6]> <tibble [3,286 × 3]> <tibble> 150. -32.1
#> 18 8 <tibble [55 × 6]> <tibble [2,915 × 3]> <tibble> 148. -35.3
#> 19 11 <tibble [53 × 6]> <tibble [2,809 × 3]> <tibble> 145. -37.8
#> 20 7 <tibble [49 × 6]> <tibble [2,571 × 3]> <tibble> 147. -41.8
You can also use face_temporal()
to get the cluster
level long cubble:
<- cluster_nested %>% face_temporal(ts)
cluster_long
cluster_long#> # cubble: wk, cluster [20]: long form
#> # bbox: [113.53, -43.66, 153.64, -10.05]
#> # spatial: id [chr], lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id
#> # [dbl], hull [list], cent_long [dbl], cent_lat [dbl]
#> cluster id wk prcp
#> <int> <chr> <dbl> <dbl>
#> 1 3 ASN00001006 1 3660
#> 2 3 ASN00001006 2 1766
#> 3 3 ASN00001006 3 1748
#> 4 3 ASN00001006 4 4614
#> 5 3 ASN00001006 5 1080
#> 6 3 ASN00001006 6 1868
#> 7 3 ASN00001006 7 2494
#> 8 3 ASN00001006 8 1696
#> 9 3 ASN00001006 9 3124
#> 10 3 ASN00001006 10 1834
#> # … with 33,814 more rows
Now we should have access to both station and cluster level in the nested and long form. Let’s summarise them within a diagram:
With these data, we can make a glyph map to understand the precipitation pattern in Australia:
Or to inspect the station membership of each cluster:
ggplot() +
geom_sf(data = state_map, inherit.aes = FALSE,
color = "grey80", alpha = 0.4, linetype = 3) +
geom_point(data = station_nested, aes(x = long, y = lat), size = 0.5) +
::geom_mark_hull(data = cluster_nested %>% tidyr::unnest(hull),
ggforceexpand = 0, radius = 0,
aes(x = long, y = lat, group = cluster)) +
theme_void()