1. Details on cubble design

This vignette provides some details on the cubble class. Here is a diagram that summarises the attributes associated with a nested and long cubble:

Here the attributes can be divided into four categories:

Now we look into these attributes through two examples, one for nested cubble and one for long cubble:

Nested cubble

A nested cubble is built on top of a rowwise_df, which comes with the groups attributes, as well as the usual names, row.names, and class. Note that the class attribute contains the history class that a rowwise_df is built upon: tbl_df, tbl and data.frame. This is an example of the attributes for a nested cubble:

climate_small <-  climate_flat %>%  
  filter(date %in% as.Date(c("2020-01-01", "2020-01-02", "2020-01-03", "2020-01-04"))) 

nested <- climate_small%>%  
  as_cubble(key = id, index = date, coords = c(long, lat))
nested
#> # cubble:   id [5]: nested form
#> # bbox:     [115.97, -32.94, 133.55, -12.42]
#> # temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
#>   id            lat  long  elev name           wmo_id ts              
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl> <list>          
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610 <tibble [4 × 4]>
#> 2 ASN00010311 -31.9  117. 179   york            94623 <tibble [4 × 4]>
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627 <tibble [4 × 4]>
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120 <tibble [4 × 4]>
#> 5 ASN00015131 -17.6  134. 220   elliott         94236 <tibble [4 × 4]>
attributes(nested)
#> $class
#> [1] "cubble_df"  "rowwise_df" "tbl_df"     "tbl"        "data.frame"
#> 
#> $row.names
#> [1] 1 2 3 4 5
#> 
#> $names
#> [1] "id"     "lat"    "long"   "elev"   "name"   "wmo_id" "ts"    
#> 
#> $groups
#> # A tibble: 5 × 2
#>   id                .rows
#>   <chr>       <list<int>>
#> 1 ASN00009021         [1]
#> 2 ASN00010311         [1]
#> 3 ASN00010614         [1]
#> 4 ASN00014015         [1]
#> 5 ASN00015131         [1]
#> 
#> $index
#> [1] "date"
#> 
#> $coords
#> [1] "long" "lat" 
#> 
#> $form
#> [1] "nested"

In this example, four stations are recorded at four time points: 1st to 4th Jan 2020. The key specified is mapped to the groups while index and coords are stored as the variable name in a cubble. This attribute will be the same as key most of the time but a useful piece of information to record later in the hierarchical data structure.

The rowwise_df class uses a group attribute to ensure each row is in its own group and this structure makes it simpler to calculate on the list. For example calculating the number of non-raining day can be done by:

nested %>%  
  mutate(rain = sum(ts$prcp != 0, na.rm = TRUE))
#> # cubble:   id [5]: nested form
#> # bbox:     [115.97, -32.94, 133.55, -12.42]
#> # temporal: date [date], prcp [dbl], tmax [dbl], tmin [dbl]
#>   id            lat  long  elev name           wmo_id ts                rain
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl> <list>           <int>
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610 <tibble [4 × 4]>     1
#> 2 ASN00010311 -31.9  117. 179   york            94623 <tibble [4 × 4]>     0
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627 <tibble [4 × 4]>     0
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120 <tibble [4 × 4]>     2
#> 5 ASN00015131 -17.6  134. 220   elliott         94236 <tibble [4 × 4]>     3

which matches the basic mutate style of calculation, so it is easier to remember than the purrr style syntax:

climate_small %>%  
  tidyr::nest(c(date, prcp, tmax, tmin)) %>%  
  mutate(rain = purrr::map_dbl(data, ~sum(.x$prcp != 0, na.rm= TRUE)))
#> # A tibble: 5 × 8
#>   id            lat  long  elev name           wmo_id data              rain
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl> <list>           <dbl>
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610 <tibble [4 × 4]>     1
#> 2 ASN00010311 -31.9  117. 179   york            94623 <tibble [4 × 4]>     0
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627 <tibble [4 × 4]>     0
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120 <tibble [4 × 4]>     2
#> 5 ASN00015131 -17.6  134. 220   elliott         94236 <tibble [4 × 4]>     3

Long cubble

A long form cubble is built from the grouped_df class where observations from the same site forms a group. Below prints a long cubble along with its attributes:

long <- nested %>%  face_temporal()
long
#> # cubble:  date, id [5]: long form
#> # bbox:    [115.97, -32.94, 133.55, -12.42]
#> # spatial: lat [dbl], long [dbl], elev [dbl], name [chr], wmo_id [dbl]
#>    id          date        prcp  tmax  tmin
#>    <chr>       <date>     <dbl> <dbl> <dbl>
#>  1 ASN00009021 2020-01-01     0  31.9  15.3
#>  2 ASN00009021 2020-01-02     0  24.9  16.4
#>  3 ASN00009021 2020-01-03     6  23.2  13  
#>  4 ASN00009021 2020-01-04     0  28.4  12.4
#>  5 ASN00010311 2020-01-01     0  38    16.4
#>  6 ASN00010311 2020-01-02     0  30    15.5
#>  7 ASN00010311 2020-01-03     0  25.2  10.5
#>  8 ASN00010311 2020-01-04     0  26.3   7  
#>  9 ASN00010614 2020-01-01     0  34    14  
#> 10 ASN00010614 2020-01-02     0  27.5  15.2
#> 11 ASN00010614 2020-01-03     0  21    10.9
#> 12 ASN00010614 2020-01-04     0  24.9   6.5
#> 13 ASN00014015 2020-01-01    58  34.4  23.6
#> 14 ASN00014015 2020-01-02    32  34.5  25.9
#> 15 ASN00014015 2020-01-03     0  34.7  26.3
#> 16 ASN00014015 2020-01-04     0  35    26.6
#> 17 ASN00015131 2020-01-01    18  40.1  24.6
#> 18 ASN00015131 2020-01-02    32  40.6  27.6
#> 19 ASN00015131 2020-01-03    34  41.2  24  
#> 20 ASN00015131 2020-01-04     0  40.9  26.8
attributes(long)
#> $class
#> [1] "cubble_df"  "grouped_df" "tbl_df"     "tbl"        "data.frame"
#> 
#> $row.names
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
#> 
#> $names
#> [1] "id"   "date" "prcp" "tmax" "tmin"
#> 
#> $groups
#> # A tibble: 5 × 2
#>   id                .rows
#>   <chr>       <list<int>>
#> 1 ASN00009021         [4]
#> 2 ASN00010311         [4]
#> 3 ASN00010614         [4]
#> 4 ASN00014015         [4]
#> 5 ASN00015131         [4]
#> 
#> $index
#> [1] "date"
#> 
#> $spatial
#> # A tibble: 5 × 6
#>   id            lat  long  elev name           wmo_id
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl>
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610
#> 2 ASN00010311 -31.9  117. 179   york            94623
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120
#> 5 ASN00015131 -17.6  134. 220   elliott         94236
#> 
#> $coords
#> [1] "long" "lat" 
#> 
#> $form
#> [1] "long"

The spatial attribute records the time-invariant variables in the data, so that when switch back to the nested form, these variables won’t get lost.

Access cubble attributes

Apart from the usual %@% or attr(DATA, "ATTRIBUTE") to extract class attributes, cubble provides functions with the corresponding attribute name for easier extraction: index(), key_vars(), key_data(), coords(), and spatial() can be used to extract the relevant component in both forms:

key_vars(long)
#> [1] "id"
key_data(long)
#> # A tibble: 5 × 2
#>   id                .rows
#>   <chr>       <list<int>>
#> 1 ASN00009021         [4]
#> 2 ASN00010311         [4]
#> 3 ASN00010614         [4]
#> 4 ASN00014015         [4]
#> 5 ASN00015131         [4]
spatial(long)
#> # A tibble: 5 × 6
#>   id            lat  long  elev name           wmo_id
#>   <chr>       <dbl> <dbl> <dbl> <chr>           <dbl>
#> 1 ASN00009021 -31.9  116.  15.4 perth airport   94610
#> 2 ASN00010311 -31.9  117. 179   york            94623
#> 3 ASN00010614 -32.9  117. 338   narrogin        94627
#> 4 ASN00014015 -12.4  131.  30.4 darwin airport  94120
#> 5 ASN00015131 -17.6  134. 220   elliott         94236