Pruning and Sorting Tables

Gabriel Becker and Adrian Waddell

2022-05-20

Introduction

Often we want to filter or reorder subsections of a table in ways that take into account the table structure. For example

library(rtables)
library(dplyr)

A Table In Need of Attention

rawtable <- basic_table() %>%
    split_cols_by("ARM") %>%
    split_cols_by("SEX") %>%
    split_rows_by("RACE") %>%
    summarize_row_groups() %>%
    split_rows_by("STRATA1") %>%
    summarize_row_groups() %>%
    analyze("AGE") %>%
    build_table(DM)
rawtable
                                                                 A: Drug X                                              B: Placebo                                           C: Combination                   
                                                F            M           U      UNDIFFERENTIATED       F            M           U      UNDIFFERENTIATED       F            M           U      UNDIFFERENTIATED
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
ASIAN                                       44 (62.9%)   35 (68.6%)   0 (NA%)       0 (NA%)        37 (66.1%)   31 (62.0%)   0 (NA%)       0 (NA%)        40 (65.6%)   44 (64.7%)   0 (NA%)       0 (NA%)     
  A                                         15 (21.4%)   12 (23.5%)   0 (NA%)       0 (NA%)        14 (25.0%)   6 (12.0%)    0 (NA%)       0 (NA%)        15 (24.6%)   16 (23.5%)   0 (NA%)       0 (NA%)     
    Mean                                      30.40        34.42        NA             NA            35.43        30.33        NA             NA            37.40        36.25        NA             NA       
  B                                         16 (22.9%)   8 (15.7%)    0 (NA%)       0 (NA%)        13 (23.2%)   16 (32.0%)   0 (NA%)       0 (NA%)        10 (16.4%)   12 (17.6%)   0 (NA%)       0 (NA%)     
    Mean                                      33.75        34.88        NA             NA            32.46        30.94        NA             NA            33.30        35.92        NA             NA       
  C                                         13 (18.6%)   15 (29.4%)   0 (NA%)       0 (NA%)        10 (17.9%)   9 (18.0%)    0 (NA%)       0 (NA%)        15 (24.6%)   16 (23.5%)   0 (NA%)       0 (NA%)     
    Mean                                      36.92        35.60        NA             NA            34.00        31.89        NA             NA            33.47        31.38        NA             NA       
BLACK OR AFRICAN AMERICAN                   18 (25.7%)   10 (19.6%)   0 (NA%)       0 (NA%)        12 (21.4%)   12 (24.0%)   0 (NA%)       0 (NA%)        13 (21.3%)   14 (20.6%)   0 (NA%)       0 (NA%)     
  A                                          5 (7.1%)     1 (2.0%)    0 (NA%)       0 (NA%)         5 (8.9%)     2 (4.0%)    0 (NA%)       0 (NA%)         4 (6.6%)     4 (5.9%)    0 (NA%)       0 (NA%)     
    Mean                                      31.20        33.00        NA             NA            28.00        30.00        NA             NA            30.75        36.50        NA             NA       
  B                                         7 (10.0%)     3 (5.9%)    0 (NA%)       0 (NA%)         3 (5.4%)     3 (6.0%)    0 (NA%)       0 (NA%)         6 (9.8%)     6 (8.8%)    0 (NA%)       0 (NA%)     
    Mean                                      36.14        34.33        NA             NA            29.67        32.00        NA             NA            36.33        31.00        NA             NA       
  C                                          6 (8.6%)    6 (11.8%)    0 (NA%)       0 (NA%)         4 (7.1%)    7 (14.0%)    0 (NA%)       0 (NA%)         3 (4.9%)     4 (5.9%)    0 (NA%)       0 (NA%)     
    Mean                                      31.33        39.67        NA             NA            34.50        34.00        NA             NA            33.00        36.50        NA             NA       
WHITE                                       8 (11.4%)    6 (11.8%)    0 (NA%)       0 (NA%)        7 (12.5%)    7 (14.0%)    0 (NA%)       0 (NA%)        8 (13.1%)    10 (14.7%)   0 (NA%)       0 (NA%)     
  A                                          2 (2.9%)     1 (2.0%)    0 (NA%)       0 (NA%)         3 (5.4%)     3 (6.0%)    0 (NA%)       0 (NA%)         1 (1.6%)     5 (7.4%)    0 (NA%)       0 (NA%)     
    Mean                                      34.00        45.00        NA             NA            29.33        33.33        NA             NA            35.00        32.80        NA             NA       
  B                                          4 (5.7%)     3 (5.9%)    0 (NA%)       0 (NA%)         1 (1.8%)     4 (8.0%)    0 (NA%)       0 (NA%)         3 (4.9%)     1 (1.5%)    0 (NA%)       0 (NA%)     
    Mean                                      37.00        43.67        NA             NA            48.00        36.75        NA             NA            34.33        36.00        NA             NA       
  C                                          2 (2.9%)     2 (3.9%)    0 (NA%)       0 (NA%)         3 (5.4%)     0 (0.0%)    0 (NA%)       0 (NA%)         4 (6.6%)     4 (5.9%)    0 (NA%)       0 (NA%)     
    Mean                                      35.50        44.00        NA             NA            44.67          NA         NA             NA            38.50        35.00        NA             NA       
AMERICAN INDIAN OR ALASKA NATIVE             0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
  A                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  B                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  C                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
MULTIPLE                                     0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
  A                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  B                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  C                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
NATIVE HAWAIIAN OR OTHER PACIFIC ISLANDER    0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
  A                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  B                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  C                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
OTHER                                        0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
  A                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  B                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  C                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
UNKNOWN                                      0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
  A                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  B                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       
  C                                          0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)         0 (0.0%)     0 (0.0%)    0 (NA%)       0 (NA%)     
    Mean                                        NA           NA         NA             NA              NA           NA         NA             NA              NA           NA         NA             NA       

Trimming

Rows

Trimming represents a convenience wrapper around simple, direct subsetting of the rows of a TableTree.

We use the trim_rows() function and pass it our table and a critera function. All rows where the criteria function returns TRUE will be removed, all others will be retained.

NOTE: each row is kept or removed completely independently, with no awareness of the surrounding structure. This means, for example, that a subtree could have all its analysis rows removed and not be removed itself. For structure-aware filtering of a table, we will use pruning described in the next section.

A trimming function accepts a TableRow object and returns TRUE if the row should be removed.

The default trimming function removes rows that have no values in them that have all NA values or all 0 values (but not if there is a mix)

trim_rows(rawtable)
                                                 A: Drug X                                              B: Placebo                                           C: Combination                   
                                F            M           U      UNDIFFERENTIATED       F            M           U      UNDIFFERENTIATED       F            M           U      UNDIFFERENTIATED
——————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————————
ASIAN                       44 (62.9%)   35 (68.6%)   0 (NA%)       0 (NA%)        37 (66.1%)   31 (62.0%)   0 (NA%)       0 (NA%)        40 (65.6%)   44 (64.7%)   0 (NA%)       0 (NA%)     
  A                         15 (21.4%)   12 (23.5%)   0 (NA%)       0 (NA%)        14 (25.0%)   6 (12.0%)    0 (NA%)       0 (NA%)        15 (24.6%)   16 (23.5%)   0 (NA%)       0 (NA%)     
    Mean                      30.40        34.42        NA             NA            35.43        30.33        NA             NA            37.40        36.25        NA             NA       
  B                         16 (22.9%)   8 (15.7%)    0 (NA%)       0 (NA%)        13 (23.2%)   16 (32.0%)   0 (NA%)       0 (NA%)        10 (16.4%)   12 (17.6%)   0 (NA%)       0 (NA%)     
    Mean                      33.75        34.88        NA             NA            32.46        30.94        NA             NA            33.30        35.92        NA             NA       
  C                         13 (18.6%)   15 (29.4%)   0 (NA%)       0 (NA%)        10 (17.9%)   9 (18.0%)    0 (NA%)       0 (NA%)        15 (24.6%)   16 (23.5%)   0 (NA%)       0 (NA%)     
    Mean                      36.92        35.60        NA             NA            34.00        31.89        NA             NA            33.47        31.38        NA             NA       
BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   0 (NA%)       0 (NA%)        12 (21.4%)   12 (24.0%)   0 (NA%)       0 (NA%)        13 (21.3%)   14 (20.6%)   0 (NA%)       0 (NA%)     
  A                          5 (7.1%)     1 (2.0%)    0 (NA%)       0 (NA%)         5 (8.9%)     2 (4.0%)    0 (NA%)       0 (NA%)         4 (6.6%)     4 (5.9%)    0 (NA%)       0 (NA%)     
    Mean                      31.20        33.00        NA             NA            28.00        30.00        NA             NA            30.75        36.50        NA             NA       
  B                         7 (10.0%)     3 (5.9%)    0 (NA%)       0 (NA%)         3 (5.4%)     3 (6.0%)    0 (NA%)       0 (NA%)         6 (9.8%)     6 (8.8%)    0 (NA%)       0 (NA%)     
    Mean                      36.14        34.33        NA             NA            29.67        32.00        NA             NA            36.33        31.00        NA             NA       
  C                          6 (8.6%)    6 (11.8%)    0 (NA%)       0 (NA%)         4 (7.1%)    7 (14.0%)    0 (NA%)       0 (NA%)         3 (4.9%)     4 (5.9%)    0 (NA%)       0 (NA%)     
    Mean                      31.33        39.67        NA             NA            34.50        34.00        NA             NA            33.00        36.50        NA             NA       
WHITE                       8 (11.4%)    6 (11.8%)    0 (NA%)       0 (NA%)        7 (12.5%)    7 (14.0%)    0 (NA%)       0 (NA%)        8 (13.1%)    10 (14.7%)   0 (NA%)       0 (NA%)     
  A                          2 (2.9%)     1 (2.0%)    0 (NA%)       0 (NA%)         3 (5.4%)     3 (6.0%)    0 (NA%)       0 (NA%)         1 (1.6%)     5 (7.4%)    0 (NA%)       0 (NA%)     
    Mean                      34.00        45.00        NA             NA            29.33        33.33        NA             NA            35.00        32.80        NA             NA       
  B                          4 (5.7%)     3 (5.9%)    0 (NA%)       0 (NA%)         1 (1.8%)     4 (8.0%)    0 (NA%)       0 (NA%)         3 (4.9%)     1 (1.5%)    0 (NA%)       0 (NA%)     
    Mean                      37.00        43.67        NA             NA            48.00        36.75        NA             NA            34.33        36.00        NA             NA       
  C                          2 (2.9%)     2 (3.9%)    0 (NA%)       0 (NA%)         3 (5.4%)     0 (0.0%)    0 (NA%)       0 (NA%)         4 (6.6%)     4 (5.9%)    0 (NA%)       0 (NA%)     
    Mean                      35.50        44.00        NA             NA            44.67          NA         NA             NA            38.50        35.00        NA             NA       

Trimming Columns

There are currently no special utilities for trimming columns but we can remove the empty columns with fairly straightforward column subsetting:

coltrimmed <- rawtable[,col_counts(rawtable) > 0]
Note: method with signature 'VTableTree#missing#ANY' chosen for function '[',
 target signature 'TableTree#missing#logical'.
 "VTableTree#ANY#logical" would also be valid
head(coltrimmed)
                  A: Drug X                B: Placebo              C: Combination     
               F            M            F            M            F            M     
——————————————————————————————————————————————————————————————————————————————————————
ASIAN      44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  A        15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean     30.40        34.42        35.43        30.33        37.40        36.25   
  B        16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean     33.75        34.88        32.46        30.94        33.30        35.92   
  C        13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)

Pruning

Pruning is similar in outcome to trimming, but more powerful and more complex, as it takes structure into account.

Pruning is applied recursively, in that at each structural unit (subtable, row) it both applies the pruning function at that level and to all it’s children (up to a user-specifiable maximum depth).

The default pruning funciton, for example, determines if a subtree is empty by

  1. Removing all children which contain a single content row which contains all zeros or all NAs
  2. Removing rows which contain either all zeros or all NAs
  3. Removing the full subtree if no unpruned children remain
pruned <- prune_table(coltrimmed)
pruned
                                   A: Drug X                B: Placebo              C: Combination     
                                F            M            F            M            F            M     
———————————————————————————————————————————————————————————————————————————————————————————————————————
ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
  B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
  C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
    Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
  A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
  B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
    Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
  C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
    Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
  A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
    Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
  B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
    Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
  C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      35.50        44.00        44.67          NA         38.50        35.00   

We can also use the low_obs_pruner pruning function constructor to create a pruning function which removes subtrees with content summaries whose first entries for each column sum or average to below a specified number. (In the default summaries the first entry per column is the count).

pruned2 <- prune_table(coltrimmed, low_obs_pruner(10, "mean"))
pruned2
                  A: Drug X                B: Placebo              C: Combination     
               F            M            F            M            F            M     
——————————————————————————————————————————————————————————————————————————————————————
ASIAN      44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  A        15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean     30.40        34.42        35.43        30.33        37.40        36.25   
  B        16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean     33.75        34.88        32.46        30.94        33.30        35.92   
  C        13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
    Mean     36.92        35.60        34.00        31.89        33.47        31.38   

Note that because the pruning is being applied recursively, only the ASIAN subtree remains because even though the full BLACK OR AFRICAN AMERICAN subtree encompassed enough observations, the strata within it did not. We can take care of this by setting the stop_depth for pruning to 1.

pruned3 <- prune_table(coltrimmed, low_obs_pruner(10, "sum"), stop_depth = 1)
pruned3
                                   A: Drug X                B: Placebo              C: Combination     
                                F            M            F            M            F            M     
———————————————————————————————————————————————————————————————————————————————————————————————————————
ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
  B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
  C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
    Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
  A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
  B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
    Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
  C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
    Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
  A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
    Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
  B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
    Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
  C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      35.50        44.00        44.67          NA         38.50        35.00   

We can also see that pruning to a lower number of observations, say, to a total of 16, with no stop_depth removes some but not all of the strata from our third race (WHITE)

pruned4 <- prune_table(coltrimmed, low_obs_pruner(16, "sum"))
pruned4
                                   A: Drug X                B: Placebo              C: Combination     
                                F            M            F            M            F            M     
———————————————————————————————————————————————————————————————————————————————————————————————————————
ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
  B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
  C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
    Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
  A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
  B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
    Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
  C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
    Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
  B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
    Mean                      37.00        43.67        48.00        36.75        34.33        36.00   

Sorting

Sorting an rtable is done at a path and recursively, meaning a sort opreation will occur at a particular location within the table, and the subtables( children) will both be reordered themselves and potentially have their own children reordered as well.

This is done by giving a score function which accepts a subtree or TableRow and returns a single numeric value. Within the context currently being sorted, the subtrees are then reordered by the value of the score function.

Another difference between pruning and sorting is that sorting occurs at particular places in the table, as defined by a path. The path can contain "*" to indicate that at that portion of the structure sorting should occur separately within branch of the path.

Sort the strata by observation counts within just the ASIAN subtable:

sort_at_path(pruned, path = c("RACE", "ASIAN", "STRATA1"), scorefun = cont_n_allcols)
                                   A: Drug X                B: Placebo              C: Combination     
                                F            M            F            M            F            M     
———————————————————————————————————————————————————————————————————————————————————————————————————————
ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
  C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
    Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
  B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
  A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
  B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
    Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
  C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
    Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
  A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
    Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
  B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
    Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
  C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      35.50        44.00        44.67          NA         38.50        35.00   

Sort the ethnicities by observations, increasing

ethsort <- sort_at_path(pruned, path = c("RACE"), scorefun = cont_n_allcols, decreasing = FALSE)
ethsort
                                   A: Drug X                B: Placebo              C: Combination     
                                F            M            F            M            F            M     
———————————————————————————————————————————————————————————————————————————————————————————————————————
WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
  A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
    Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
  B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
    Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
  C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      35.50        44.00        44.67          NA         38.50        35.00   
BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
  A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
  B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
    Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
  C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
    Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
  B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
  C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
    Mean                      36.92        35.60        34.00        31.89        33.47        31.38   

Within each ethnicity separately, sort the strata by number of females in arm c (ie column position 5)

sort_at_path(pruned, path = c("RACE", "*", "STRATA1"), cont_n_onecol(5))
                                   A: Drug X                B: Placebo              C: Combination     
                                F            M            F            M            F            M     
———————————————————————————————————————————————————————————————————————————————————————————————————————
ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
  C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
    Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
  B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
  B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
    Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
  A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
  C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
    Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
  C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      35.50        44.00        44.67          NA         38.50        35.00   
  B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
    Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
  A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
    Mean                      34.00        45.00        29.33        33.33        35.00        32.80   

Sorting Within an Analysis Subtable

When sorting within an analysis subtable (e.g., the subtable generated when your analysis function generates more than one row per group of data), the name of that subtable (generally the name of the variable being analyzed) must appear in the path, even if the variable label is not displayed when the table is printed

silly_afun = function(x) {
    in_rows(a = rcell(2),
            b = rcell(3),
            c = rcell(1))
}



sillytbl <- basic_table() %>% split_rows_by("cyl") %>%
    analyze("mpg", silly_afun) %>%
    build_table(mtcars)
Split var [cyl] was not character or factor. Converting to factor
sillytbl
      all obs
—————————————
6            
  a      2   
  b      3   
  c      1   
4            
  a      2   
  b      3   
  c      1   
8            
  a      2   
  b      3   
  c      1   

The path required to sort the rows inside our “analysis” of mpg, then is c("cyl", "*", "mpg"):

scorefun <- function(tt) { mean(unlist(row_values(tt)))}
sort_at_path(sillytbl, c("cyl", "*", "mpg"), scorefun)
      all obs
—————————————
6            
  b      3   
  a      2   
  c      1   
4            
  b      3   
  a      2   
  c      1   
8            
  b      3   
  a      2   
  c      1   

Writing Custom Pruning Criteria and Scoring Functions

Pruning criteria and scoring functions map TableTree or TableRow objects to a boolean value (for pruning criteria) or a sortable scalar value (scoring functions). To do this we currently need to interact with the structure of the objects in more than usual.

Useful Functions and Accessors

content_table Retrieves a TableTree object’s content table (which contains its summary rows).

tree_children Retrieves a TableTree object’s children (either subtables, rows or possibly a mix thereof, though that should not happen in practice)

row_values Retrieves a TableRow object’s values in the form of a list of length ncol(tt)

vapply(row_values(tt), '[[', i=1, numeric(1)) will retrieve the first element from each cell provided tt is a TableRow (and the first element is a numeric value).

obj_name Retrieves the name of an object. Note this can differ from the label that is displayed (if any is) when printing. This will match the element in the path.

obj_label Retrieves the display label of an object. Note this can differ from the name that appears in the path.

Example Custom Scoring Functions

Sort by a character “score”

In this case, for convenience/simplicity, we use the name of the table element but any logic which returns a single string could be used here.

We sort the ethnicities by alphabetical order (in practice undoing our previous sorting by ethnicity above).

silly_name_scorer = function(tt) {
    nm = obj_name(tt)
    print(nm)
    nm
}

sort_at_path(ethsort, "RACE", silly_name_scorer)
[1] "WHITE"
[1] "BLACK OR AFRICAN AMERICAN"
[1] "ASIAN"
                                   A: Drug X                B: Placebo              C: Combination     
                                F            M            F            M            F            M     
———————————————————————————————————————————————————————————————————————————————————————————————————————
ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
  B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
  C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
    Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
  A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
  B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
    Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
  C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
    Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
  A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
    Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
  B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
    Mean                      37.00        43.67        48.00        36.75        34.33        36.00   
  C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      35.50        44.00        44.67          NA         38.50        35.00   

NOTE generally this would be more appropriately done using the reorder_split_levels function within the layout rather than as a sort postprocessing step, but other character scorers may or may not not map as easily to layouting directives.

Sort by the Percent Difference in counts between genders in Arm C

We need the F and M percents, only for Arm C (ie columns 5 and 6), differenced.

We will sort the strata within each ethnicity by the percent difference in counts between males and females in arm C. This is not statistically meaningful at all, and is fact a terrible idea because it reorders the strata seemingly (but not) at random within each race, but illustrates the various things we need to do inside custom sorting functions.

silly_gender_diffcount = function(tt) {
    ctable = content_table(tt) ## get summary table at this location
    crow = tree_children(ctable)[[1]] ## get first row in summary table
    vals = row_values(crow)
    ## we need to have a better api for specificying location in column space but currently we don't
    mcount = vals[[6]][1]
    fcount = vals[[5]][1]
    (mcount - fcount)/fcount
}

sort_at_path(pruned, c("RACE", "*", "STRATA1"), silly_gender_diffcount)
                                   A: Drug X                B: Placebo              C: Combination     
                                F            M            F            M            F            M     
———————————————————————————————————————————————————————————————————————————————————————————————————————
ASIAN                       44 (62.9%)   35 (68.6%)   37 (66.1%)   31 (62.0%)   40 (65.6%)   44 (64.7%)
  B                         16 (22.9%)   8 (15.7%)    13 (23.2%)   16 (32.0%)   10 (16.4%)   12 (17.6%)
    Mean                      33.75        34.88        32.46        30.94        33.30        35.92   
  A                         15 (21.4%)   12 (23.5%)   14 (25.0%)   6 (12.0%)    15 (24.6%)   16 (23.5%)
    Mean                      30.40        34.42        35.43        30.33        37.40        36.25   
  C                         13 (18.6%)   15 (29.4%)   10 (17.9%)   9 (18.0%)    15 (24.6%)   16 (23.5%)
    Mean                      36.92        35.60        34.00        31.89        33.47        31.38   
BLACK OR AFRICAN AMERICAN   18 (25.7%)   10 (19.6%)   12 (21.4%)   12 (24.0%)   13 (21.3%)   14 (20.6%)
  C                          6 (8.6%)    6 (11.8%)     4 (7.1%)    7 (14.0%)     3 (4.9%)     4 (5.9%) 
    Mean                      31.33        39.67        34.50        34.00        33.00        36.50   
  A                          5 (7.1%)     1 (2.0%)     5 (8.9%)     2 (4.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      31.20        33.00        28.00        30.00        30.75        36.50   
  B                         7 (10.0%)     3 (5.9%)     3 (5.4%)     3 (6.0%)     6 (9.8%)     6 (8.8%) 
    Mean                      36.14        34.33        29.67        32.00        36.33        31.00   
WHITE                       8 (11.4%)    6 (11.8%)    7 (12.5%)    7 (14.0%)    8 (13.1%)    10 (14.7%)
  A                          2 (2.9%)     1 (2.0%)     3 (5.4%)     3 (6.0%)     1 (1.6%)     5 (7.4%) 
    Mean                      34.00        45.00        29.33        33.33        35.00        32.80   
  C                          2 (2.9%)     2 (3.9%)     3 (5.4%)     0 (0.0%)     4 (6.6%)     4 (5.9%) 
    Mean                      35.50        44.00        44.67          NA         38.50        35.00   
  B                          4 (5.7%)     3 (5.9%)     1 (1.8%)     4 (8.0%)     3 (4.9%)     1 (1.5%) 
    Mean                      37.00        43.67        48.00        36.75        34.33        36.00