This vignette provides some details on the cubble class. Here is a diagram that summarises the attributes associated with a nested and long cubble:
Here the attributes can be divided into four categories:
class
, row.names
, names
are
attributes inherited from a tibble
class,groups
is inherited from rowwise_df
or
grouped_df
a nested or long cubble is built,index
, coords
, and form
are
cubble
specific attributes, andspatial
is a special attribute that only the long
cubble hasNow we look into these attributes through two examples, one for nested cubble and one for long cubble:
A nested cubble is built on top of a rowwise_df
, which
comes with the groups
attributes, as well as the usual
names
, row.names
, and class
. Note
that the class
attribute contains the history class that a
rowwise_df
is built upon: tbl_df
,
tbl
and data.frame
. This is an example of the
attributes for a nested cubble:
<- climate_flat %>%
climate_small filter(date %in% as.Date(c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04")))
<- climate_small%>%
nested as_cubble(key = id, index = date, coords = c(long, lat))
nested#> # cubble: id [5]: nested form
#> # bbox: [115.97, -32.94, 133.55, -12.42]
#> # temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
#> id lat long elev name wmo_id ts
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <list>
#> 1 ASN00009021 -31.9 116. 15.4 perth airport 94610 <tibble [4 × 4]>
#> 2 ASN00010311 -31.9 117. 179 york 94623 <tibble [4 × 4]>
#> 3 ASN00010614 -32.9 117. 338 narrogin 94627 <tibble [4 × 4]>
#> 4 ASN00014015 -12.4 131. 30.4 darwin airport 94120 <tibble [4 × 4]>
#> 5 ASN00015131 -17.6 134. 220 elliott 94236 <tibble [4 × 4]>
attributes(nested)
#> $class
#> [1] "cubble_df" "rowwise_df" "tbl_df" "tbl" "data.frame"
#>
#> $row.names
#> [1] 1 2 3 4 5
#>
#> $names
#> [1] "id" "lat" "long" "elev" "name" "wmo_id" "ts"
#>
#> $groups
#> # A tibble: 5 × 2
#> id .rows
#> <chr> <list<int>>
#> 1 ASN00009021 [1]
#> 2 ASN00010311 [1]
#> 3 ASN00010614 [1]
#> 4 ASN00014015 [1]
#> 5 ASN00015131 [1]
#>
#> $index
#> [1] "date"
#>
#> $coords
#> [1] "long" "lat"
#>
#> $form
#> [1] "nested"
In this example, four stations are recorded at four time points: 1st
to 4th Jan 2020. The key
specified is mapped to the
groups
while index
and coords
are
stored as the variable name in a cubble. This attribute will be the same
as key
most of the time but a useful piece of information
to record later in the hierarchical data structure.
The rowwise_df
class uses a group
attribute
to ensure each row is in its own group and this structure makes it
simpler to calculate on the list. For example calculating the number of
non-raining day can be done by:
%>%
nested mutate(rain = sum(ts$prcp != 0, na.rm = TRUE))
#> # cubble: id [5]: nested form
#> # bbox: [115.97, -32.94, 133.55, -12.42]
#> # temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
#> id lat long elev name wmo_id ts rain
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <list> <int>
#> 1 ASN00009021 -31.9 116. 15.4 perth airport 94610 <tibble [4 × 4]> 1
#> 2 ASN00010311 -31.9 117. 179 york 94623 <tibble [4 × 4]> 0
#> 3 ASN00010614 -32.9 117. 338 narrogin 94627 <tibble [4 × 4]> 0
#> 4 ASN00014015 -12.4 131. 30.4 darwin airport 94120 <tibble [4 × 4]> 2
#> 5 ASN00015131 -17.6 134. 220 elliott 94236 <tibble [4 × 4]> 3
which matches the basic mutate style of calculation, so it is easier
to remember than the purrr
style syntax:
%>%
climate_small ::nest(c(date, prcp, tmax, tmin)) %>%
tidyrmutate(rain = purrr::map_dbl(data, ~sum(.x$prcp != 0, na.rm= TRUE)))
#> # A tibble: 5 × 8
#> id lat long elev name wmo_id data rain
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <list> <dbl>
#> 1 ASN00009021 -31.9 116. 15.4 perth airport 94610 <tibble [4 × 4]> 1
#> 2 ASN00010311 -31.9 117. 179 york 94623 <tibble [4 × 4]> 0
#> 3 ASN00010614 -32.9 117. 338 narrogin 94627 <tibble [4 × 4]> 0
#> 4 ASN00014015 -12.4 131. 30.4 darwin airport 94120 <tibble [4 × 4]> 2
#> 5 ASN00015131 -17.6 134. 220 elliott 94236 <tibble [4 × 4]> 3
A long form cubble is built from the grouped_df
class
where observations from the same site forms a group. Below prints a long
cubble along with its attributes:
<- nested %>% face_temporal()
long
long#> # cubble: date, id [5]: long form
#> # bbox: [115.97, -32.94, 133.55, -12.42]
#> # spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
#> id date prcp tmax tmin
#> <chr> <date> <dbl> <dbl> <dbl>
#> 1 ASN00009021 2020-01-01 0 31.9 15.3
#> 2 ASN00009021 2020-01-02 0 24.9 16.4
#> 3 ASN00009021 2020-01-03 6 23.2 13
#> 4 ASN00009021 2020-01-04 0 28.4 12.4
#> 5 ASN00010311 2020-01-01 0 38 16.4
#> 6 ASN00010311 2020-01-02 0 30 15.5
#> 7 ASN00010311 2020-01-03 0 25.2 10.5
#> 8 ASN00010311 2020-01-04 0 26.3 7
#> 9 ASN00010614 2020-01-01 0 34 14
#> 10 ASN00010614 2020-01-02 0 27.5 15.2
#> 11 ASN00010614 2020-01-03 0 21 10.9
#> 12 ASN00010614 2020-01-04 0 24.9 6.5
#> 13 ASN00014015 2020-01-01 58 34.4 23.6
#> 14 ASN00014015 2020-01-02 32 34.5 25.9
#> 15 ASN00014015 2020-01-03 0 34.7 26.3
#> 16 ASN00014015 2020-01-04 0 35 26.6
#> 17 ASN00015131 2020-01-01 18 40.1 24.6
#> 18 ASN00015131 2020-01-02 32 40.6 27.6
#> 19 ASN00015131 2020-01-03 34 41.2 24
#> 20 ASN00015131 2020-01-04 0 40.9 26.8
attributes(long)
#> $class
#> [1] "cubble_df" "grouped_df" "tbl_df" "tbl" "data.frame"
#>
#> $row.names
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
#>
#> $names
#> [1] "id" "date" "prcp" "tmax" "tmin"
#>
#> $groups
#> # A tibble: 5 × 2
#> id .rows
#> <chr> <list<int>>
#> 1 ASN00009021 [4]
#> 2 ASN00010311 [4]
#> 3 ASN00010614 [4]
#> 4 ASN00014015 [4]
#> 5 ASN00015131 [4]
#>
#> $index
#> [1] "date"
#>
#> $spatial
#> # A tibble: 5 × 6
#> id lat long elev name wmo_id
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 ASN00009021 -31.9 116. 15.4 perth airport 94610
#> 2 ASN00010311 -31.9 117. 179 york 94623
#> 3 ASN00010614 -32.9 117. 338 narrogin 94627
#> 4 ASN00014015 -12.4 131. 30.4 darwin airport 94120
#> 5 ASN00015131 -17.6 134. 220 elliott 94236
#>
#> $coords
#> [1] "long" "lat"
#>
#> $form
#> [1] "long"
The spatial
attribute records the time-invariant
variables in the data, so that when switch back to the nested form,
these variables won’t get lost.
Apart from the usual %@%
or
attr(DATA, "ATTRIBUTE")
to extract class attributes, cubble
provides functions with the corresponding attribute name for easier
extraction: index()
, key_vars()
,
key_data()
, coords()
, and
spatial()
can be used to extract the relevant component in
both forms:
key_vars(long)
#> [1] "id"
key_data(long)
#> # A tibble: 5 × 2
#> id .rows
#> <chr> <list<int>>
#> 1 ASN00009021 [4]
#> 2 ASN00010311 [4]
#> 3 ASN00010614 [4]
#> 4 ASN00014015 [4]
#> 5 ASN00015131 [4]
spatial(long)
#> # A tibble: 5 × 6
#> id lat long elev name wmo_id
#> <chr> <dbl> <dbl> <dbl> <chr> <dbl>
#> 1 ASN00009021 -31.9 116. 15.4 perth airport 94610
#> 2 ASN00010311 -31.9 117. 179 york 94623
#> 3 ASN00010614 -32.9 117. 338 narrogin 94627
#> 4 ASN00014015 -12.4 131. 30.4 darwin airport 94120
#> 5 ASN00015131 -17.6 134. 220 elliott 94236