Introduction to the msSPChelpR package - from long dataset to SIR analyses

Marian Eberl

26 October 2020

Introduction

This vignette explains how to use the functions:

For some functions there are multiple variants of the same function using varying frameworks. They give the same results but will differ in execution time and memory use:

Theory behind SIRs

In the next version of this vignette the theoretical considerations how SIRs are calculated will be explained in this chapter.

Examples

SEER lung cancer

Step 1 - Long dataset

library(dplyr)
#> 
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#> 
#>     filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(magrittr)
library(msSPChelpR)
#Load synthetic dataset of patients with cancer to demonstrate package functions
data("us_second_cancer")

#This dataset is in long format, so each tumor is a separate row in the data
us_second_cancer
#> # A tibble: 113,999 x 15
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <int> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg ~ Male  White 1926-01-01 1992-07-15 C50        hist~
#>  2 100004        2 SEER Reg ~ Male  White 1926-01-01 2004-01-15 C54        hist~
#>  3 100004        3 SEER Reg ~ Male  White 1926-01-01 2006-06-15 C34        hist~
#>  4 100004        4 SEER Reg ~ Male  White 1926-01-01 2018-06-15 C14        DCO ~
#>  5 100034        1 SEER Reg ~ Male  White 1979-01-01 2000-06-15 C50        hist~
#>  6 100037        1 SEER Reg ~ Fema~ White 1938-01-01 1996-01-15 C54        hist~
#>  7 100038        1 SEER Reg ~ Male  White 1989-01-01 1991-04-15 C50        hist~
#>  8 100038        2 SEER Reg ~ Male  White 1989-01-01 2000-03-15 C80        hist~
#>  9 100039        1 SEER Reg ~ Fema~ White 1946-01-01 2003-08-15 C50        hist~
#> 10 100039        2 SEER Reg ~ Fema~ White 1946-01-01 2011-04-15 C34        hist~
#> # ... with 113,989 more rows, and 6 more variables: fc_age <int>,
#> #   datedeath <date>, p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>,
#> #   t_yeardiag <chr>

Step 2 - Filter long dataset

#filter for lung cancer
ids <- us_second_cancer %>%
  #detect ids with any lung cancer
  filter(t_site_icd == "C34") %>%
  select(fake_id) %>%
  as.vector() %>%
  unname() %>%
  unlist()

filtered_usdata <- us_second_cancer %>%
  #filter according to above detected ids with any lung cancer diagnosis
  filter(fake_id %in% ids) %>%
   arrange(fake_id)

filtered_usdata
#> # A tibble: 62,661 x 15
#>    fake_id SEQ_NUM registry   sex   race  datebirth  t_datediag t_site_icd t_dco
#>    <chr>     <int> <chr>      <chr> <chr> <date>     <date>     <chr>      <chr>
#>  1 100004        1 SEER Reg ~ Male  White 1926-01-01 1992-07-15 C50        hist~
#>  2 100004        2 SEER Reg ~ Male  White 1926-01-01 2004-01-15 C54        hist~
#>  3 100004        3 SEER Reg ~ Male  White 1926-01-01 2006-06-15 C34        hist~
#>  4 100004        4 SEER Reg ~ Male  White 1926-01-01 2018-06-15 C14        DCO ~
#>  5 100039        1 SEER Reg ~ Fema~ White 1946-01-01 2003-08-15 C50        hist~
#>  6 100039        2 SEER Reg ~ Fema~ White 1946-01-01 2011-04-15 C34        hist~
#>  7 100039        3 SEER Reg ~ Fema~ White 1946-01-01 2018-01-15 C80        hist~
#>  8 100073        1 SEER Reg ~ Male  White 1960-01-01 1993-11-15 C44        hist~
#>  9 100073        2 SEER Reg ~ Male  White 1960-01-01 2003-12-15 C34        hist~
#> 10 100143        1 SEER Reg ~ Male  White 1944-01-01 1992-03-15 C50        hist~
#> # ... with 62,651 more rows, and 6 more variables: fc_age <int>,
#> #   datedeath <date>, p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>,
#> #   t_yeardiag <chr>

Step 3 - Renumber time_id

renumbered_usdata <- filtered_usdata %>%
  renumber_time_id(new_time_id_var = "t_tumid", 
                   dattype = "seer",
                   case_id_var = "fake_id")

renumbered_usdata %>%
   select(fake_id, sex, t_site_icd, t_datediag, t_tumid)
#> # A tibble: 62,661 x 5
#>    fake_id sex    t_site_icd t_datediag t_tumid
#>    <chr>   <chr>  <chr>      <date>       <int>
#>  1 100004  Male   C50        1992-07-15       1
#>  2 100004  Male   C54        2004-01-15       2
#>  3 100004  Male   C34        2006-06-15       3
#>  4 100004  Male   C14        2018-06-15       4
#>  5 100039  Female C50        2003-08-15       1
#>  6 100039  Female C34        2011-04-15       2
#>  7 100039  Female C80        2018-01-15       3
#>  8 100073  Male   C44        1993-11-15       1
#>  9 100073  Male   C34        2003-12-15       2
#> 10 100143  Male   C50        1992-03-15       1
#> # ... with 62,651 more rows

Step 4 - Reshape to wide dataset

usdata_wide <- renumbered_usdata %>%
  reshape_wide_tidyr(case_id_var = "fake_id", time_id_var = "t_tumid", timevar_max = 10)

#now the data is in the wide format as required by many package functions. 
#This means, each case is a row and several tumors per case ID are 
#add new columns to the data using the time_id as column name suffix.
usdata_wide
#> # A tibble: 31,997 x 127
#>    fake_id SEQ_NUM.1 registry.1            sex.1 race.1 datebirth.1 t_datediag.1
#>    <chr>       <int> <chr>                 <chr> <chr>  <date>      <date>      
#>  1 100004          1 SEER Reg 20 - Detroi~ Male  White  1926-01-01  1992-07-15  
#>  2 100039          1 SEER Reg 02 - Connec~ Fema~ White  1946-01-01  2003-08-15  
#>  3 100073          1 SEER Reg 01 - San Fr~ Male  White  1960-01-01  1993-11-15  
#>  4 100143          1 SEER Reg 02 - Connec~ Male  White  1944-01-01  1992-03-15  
#>  5 100182          1 SEER Reg 02 - Connec~ Male  Other  1927-01-01  1991-09-15  
#>  6 100197          1 SEER Reg 02 - Connec~ Fema~ White  1945-01-01  2012-06-15  
#>  7 100208          1 SEER Reg 02 - Connec~ Male  White  1970-01-01  2019-11-15  
#>  8 100230          1 SEER Reg 01 - San Fr~ Male  White  1947-01-01  1992-11-15  
#>  9 100234          1 SEER Reg 01 - San Fr~ Male  White  1988-01-01  2010-02-15  
#> 10 100266          1 SEER Reg 01 - San Fr~ Fema~ White  1956-01-01  2010-07-15  
#> # ... with 31,987 more rows, and 120 more variables: t_site_icd.1 <chr>,
#> #   t_dco.1 <chr>, fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>,
#> #   p_dodmin.1 <date>, fc_agegroup.1 <chr>, t_yeardiag.1 <chr>,
#> #   SEQ_NUM.2 <int>, registry.2 <chr>, sex.2 <chr>, race.2 <chr>,
#> #   datebirth.2 <date>, t_datediag.2 <date>, t_site_icd.2 <chr>, t_dco.2 <chr>,
#> #   fc_age.2 <int>, datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>,
#> #   fc_agegroup.2 <chr>, t_yeardiag.2 <chr>, SEQ_NUM.3 <int>, ...

Step 5 - Recalculate p_spc


usdata_wide <- usdata_wide %>%
  dplyr::mutate(p_spc = dplyr::case_when(is.na(t_site_icd.2)   ~ "No SPC",
                         !is.na(t_site_icd.2)           ~ "SPC developed",
                         TRUE ~ NA_character_)) %>%
  #create the same information as numeric variable count_spc
  dplyr::mutate(count_spc = dplyr::case_when(is.na(t_site_icd.2)   ~ 1,
                            TRUE ~ 0))
usdata_wide %>%
   dplyr::select(fake_id, sex.1, p_spc, count_spc, t_site_icd.1, 
                 t_datediag.1, t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 x 8
#>    fake_id sex.1  p_spc         count_spc t_site_icd.1 t_datediag.1 t_site_icd.2
#>    <chr>   <chr>  <chr>             <dbl> <chr>        <date>       <chr>       
#>  1 100004  Male   SPC developed         0 C50          1992-07-15   C54         
#>  2 100039  Female SPC developed         0 C50          2003-08-15   C34         
#>  3 100073  Male   SPC developed         0 C44          1993-11-15   C34         
#>  4 100143  Male   SPC developed         0 C50          1992-03-15   C34         
#>  5 100182  Male   SPC developed         0 C18          1991-09-15   C34         
#>  6 100197  Female SPC developed         0 C34          2012-06-15   C50         
#>  7 100208  Male   No SPC                1 C34          2019-11-15   <NA>        
#>  8 100230  Male   SPC developed         0 C44          1992-11-15   C34         
#>  9 100234  Male   No SPC                1 C34          2010-02-15   <NA>        
#> 10 100266  Female No SPC                1 C34          2010-07-15   <NA>        
#> # ... with 31,987 more rows, and 1 more variable: t_datediag.2 <date>

Step 6 - Determine patient status at end of FU

usdata_wide <- usdata_wide %>%
  pat_status(., fu_end = "2017-12-31", dattype = "seer",
             status_var = "p_status", life_var = "p_alive.1",
             spc_var = "p_spc", birthdat_var = "datebirth.1",
             lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
             spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
             life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
             spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
             use_lifedatmin = FALSE, check = TRUE, 
             as_labelled_factor = TRUE)
#> # A tibble: 10 x 3
#>    p_alive.1 p_status                                                          n
#>    <chr>     <fct>                                                         <int>
#>  1 Alive     Patient alive after FC (with or without following SPC after ~  5940
#>  2 Alive     Patient alive after SPC                                       11316
#>  3 Alive     NA - Patient not born before end of FU                            4
#>  4 Alive     NA - Patient did not develop cancer before end of FU            849
#>  5 Dead      Patient alive after FC (with or without following SPC after ~   863
#>  6 Dead      Patient alive after SPC                                        1360
#>  7 Dead      Patient dead after FC                                          6208
#>  8 Dead      Patient dead after SPC                                         5325
#>  9 Dead      NA - Patient did not develop cancer before end of FU             68
#> 10 Dead      NA - Patient date of death is missing                            64
#> # A tibble: 7 x 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6803
#> 2 Patient alive after SPC                                                12676
#> 3 Patient dead after FC                                                   6208
#> 4 Patient dead after SPC                                                  5325
#> 5 NA - Patient not born before end of FU                                     4
#> 6 NA - Patient did not develop cancer before end of FU                     917
#> 7 NA - Patient date of death is missing                                     64

usdata_wide %>%
   dplyr::select(fake_id, p_status, p_alive.1, datedeath.1, t_site_icd.1, t_datediag.1, 
                 t_site_icd.2, t_datediag.2)
#> # A tibble: 31,997 x 8
#>    fake_id p_status p_alive.1 datedeath.1 t_site_icd.1 t_datediag.1 t_site_icd.2
#>    <chr>   <fct>    <chr>     <date>      <chr>        <date>       <chr>       
#>  1 100004  Patient~ Alive     NA          C50          1992-07-15   C54         
#>  2 100039  Patient~ Alive     NA          C50          2003-08-15   C34         
#>  3 100073  Patient~ Dead      2005-06-01  C44          1993-11-15   C34         
#>  4 100143  Patient~ Alive     NA          C50          1992-03-15   C34         
#>  5 100182  Patient~ Dead      2007-05-01  C18          1991-09-15   C34         
#>  6 100197  Patient~ Alive     NA          C34          2012-06-15   C50         
#>  7 100208  NA - Pa~ Alive     NA          C34          2019-11-15   <NA>        
#>  8 100230  Patient~ Dead      2008-05-01  C44          1992-11-15   C34         
#>  9 100234  Patient~ Dead      2015-07-01  C34          2010-02-15   <NA>        
#> 10 100266  Patient~ Alive     NA          C34          2010-07-15   <NA>        
#> # ... with 31,987 more rows, and 1 more variable: t_datediag.2 <date>

#alternatively, you can impute the date of death using lifedatmin_var
usdata_wide %>%
  pat_status(., fu_end = "2017-12-31", dattype = "seer",
             status_var = "p_status", life_var = "p_alive.1",
             spc_var = "p_spc", birthdat_var = "datebirth.1",
             lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
             spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
             life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
             spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
             use_lifedatmin = TRUE, lifedatmin_var = "p_dodmin.1", 
             check = TRUE, as_labelled_factor = TRUE)
#> # A tibble: 9 x 3
#>   p_alive.1 p_status                                                           n
#>   <chr>     <fct>                                                          <int>
#> 1 Alive     Patient alive after FC (with or without following SPC after e~  5940
#> 2 Alive     Patient alive after SPC                                        11316
#> 3 Alive     NA - Patient not born before end of FU                             4
#> 4 Alive     NA - Patient did not develop cancer before end of FU             849
#> 5 Dead      Patient alive after FC (with or without following SPC after e~   867
#> 6 Dead      Patient alive after SPC                                         1361
#> 7 Dead      Patient dead after FC                                           6230
#> 8 Dead      Patient dead after SPC                                          5362
#> 9 Dead      NA - Patient did not develop cancer before end of FU              68
#> # A tibble: 6 x 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6807
#> 2 Patient alive after SPC                                                12677
#> 3 Patient dead after FC                                                   6230
#> 4 Patient dead after SPC                                                  5362
#> 5 NA - Patient not born before end of FU                                     4
#> 6 NA - Patient did not develop cancer before end of FU                     917
#> # A tibble: 31,997 x 130
#>    fake_id SEQ_NUM.1 registry.1            sex.1 race.1 datebirth.1 t_datediag.1
#>    <chr>       <int> <chr>                 <chr> <chr>  <date>      <date>      
#>  1 100004          1 SEER Reg 20 - Detroi~ Male  White  1926-01-01  1992-07-15  
#>  2 100039          1 SEER Reg 02 - Connec~ Fema~ White  1946-01-01  2003-08-15  
#>  3 100073          1 SEER Reg 01 - San Fr~ Male  White  1960-01-01  1993-11-15  
#>  4 100143          1 SEER Reg 02 - Connec~ Male  White  1944-01-01  1992-03-15  
#>  5 100182          1 SEER Reg 02 - Connec~ Male  Other  1927-01-01  1991-09-15  
#>  6 100197          1 SEER Reg 02 - Connec~ Fema~ White  1945-01-01  2012-06-15  
#>  7 100208          1 SEER Reg 02 - Connec~ Male  White  1970-01-01  2019-11-15  
#>  8 100230          1 SEER Reg 01 - San Fr~ Male  White  1947-01-01  1992-11-15  
#>  9 100234          1 SEER Reg 01 - San Fr~ Male  White  1988-01-01  2010-02-15  
#> 10 100266          1 SEER Reg 01 - San Fr~ Fema~ White  1956-01-01  2010-07-15  
#> # ... with 31,987 more rows, and 123 more variables: t_site_icd.1 <chr>,
#> #   t_dco.1 <chr>, fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>,
#> #   p_dodmin.1 <date>, fc_agegroup.1 <chr>, t_yeardiag.1 <chr>,
#> #   SEQ_NUM.2 <int>, registry.2 <chr>, sex.2 <chr>, race.2 <chr>,
#> #   datebirth.2 <date>, t_datediag.2 <date>, t_site_icd.2 <chr>, t_dco.2 <chr>,
#> #   fc_age.2 <int>, datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>,
#> #   fc_agegroup.2 <chr>, t_yeardiag.2 <chr>, SEQ_NUM.3 <int>, ...

Step 6b - Remove patients irrelevant to analysis depending on status

usdata_wide <- usdata_wide %>%
  dplyr::filter(!p_status %in% c("NA - Patient not born before end of FU",
                                 "NA - Patient did not develop cancer before end of FU",
                                 "NA - Patient date of death is missing"))

usdata_wide %>%
  dplyr::count(p_status)
#> # A tibble: 4 x 2
#>   p_status                                                                   n
#>   <fct>                                                                  <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU)  6803
#> 2 Patient alive after SPC                                                12676
#> 3 Patient dead after FC                                                   6208
#> 4 Patient dead after SPC                                                  5325

Step 7 - Calculate FU time

usdata_wide <- usdata_wide %>%
   calc_futime(., futime_var_new = "p_futimeyrs", fu_end = "2017-12-31",
               dattype = "seer", time_unit = "years", 
               lifedat_var = "datedeath.1", 
               fcdat_var = "t_datediag.1", spcdat_var = "t_datediag.2")
#> # A tibble: 4 x 5
#>   p_status                       mean_futime min_futime max_futime median_futime
#>   <fct>                                <dbl>      <dbl>      <dbl>         <dbl>
#> 1 Patient alive after FC (with ~        9.58     0.0438       27.0          8.29
#> 2 Patient alive after SPC               8.69     0            26.9          7.50
#> 3 Patient dead after FC                 8.54     0            25.8          7.47
#> 4 Patient dead after SPC                6.33     0            26.5          5.08

usdata_wide %>%
   dplyr::select(fake_id, p_status, p_futimeyrs, p_alive.1, datedeath.1, t_datediag.1, t_datediag.2)
#> # A tibble: 31,012 x 7
#>    fake_id p_status  p_futimeyrs p_alive.1 datedeath.1 t_datediag.1 t_datediag.2
#>    <chr>   <fct>           <dbl> <chr>     <date>      <date>       <date>      
#>  1 100004  Patient ~       11.5  Alive     NA          1992-07-15   2004-01-15  
#>  2 100039  Patient ~        7.67 Alive     NA          2003-08-15   2011-04-15  
#>  3 100073  Patient ~       10.1  Dead      2005-06-01  1993-11-15   2003-12-15  
#>  4 100143  Patient ~        3.33 Alive     NA          1992-03-15   1995-07-15  
#>  5 100182  Patient ~        7.08 Dead      2007-05-01  1991-09-15   1998-10-15  
#>  6 100197  Patient ~        4.83 Alive     NA          2012-06-15   2017-04-15  
#>  7 100230  Patient ~       11.0  Dead      2008-05-01  1992-11-15   2003-11-15  
#>  8 100234  Patient ~        5.37 Dead      2015-07-01  2010-02-15   NA          
#>  9 100266  Patient ~        7.46 Alive     NA          2010-07-15   NA          
#> 10 100274  Patient ~        2.38 Dead      2006-06-01  2004-01-15   NA          
#> # ... with 31,002 more rows

Step 8 - Calculate SIR

sircalc_results <- usdata_wide %>%
  sir_byfutime(
    dattype = "seer",
    ybreak_vars = c("race.1", "t_dco.1"),
    xbreak_var = "none",
    futime_breaks = c(0, 1/12, 2/12, 1, 5, 10, Inf),
    count_var = "count_spc",
    refrates_df = us_refrates_icd2,
    calc_total_row = TRUE,
    calc_total_fu = TRUE,
    region_var = "registry.1",
    age_var = "fc_agegroup.1",
    sex_var = "sex.1",
    year_var = "t_yeardiag.1",
    race_var = "race.1",
    site_var = "t_site_icd.1", #using grouping by second cancer incidence
    futime_var = "p_futimeyrs",
    alpha = 0.05)
#> [INFO Cases 0 PYARs] There are conflicts where strata with 0 follow-up time have data in observed.
#> i 40 strata are affected.
#>  - This might be caused by cases where SPC occured at the same day as first cancer.
#>  - You can check this by excluding all cases from wide_df, where date of first diagnosis is equal.
#> ! Check attribute `problems_not_empty` of results to see what strata are affected.
#>  
#> [INFO Unexpected Cases] There are observed cases in the results file that do not occur in the refrates_df.
#> i 2682 strata are affected.
#> A possible explanation can be:
#>  - DCO cases or
#>  - diagnosis of second cancer occured in different time period than first cancer
#> ! Check attribute `notes_refcases` of results to see what strata are affected.
#> 

sircalc_results %>% print(n = 100)
#> # A tidytable: 421,296 x 22
#>     age    region sex   race  year  yvar_name yvar_label fu_time t_site observed
#>     <chr>  <chr>  <chr> <chr> <chr> <chr>     <chr>      <chr>   <chr>     <dbl>
#>   1 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    to 1 m~ C14           0
#>   2 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    to 1 m~ C18           0
#>   3 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    to 1 m~ C34           0
#>   4 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    to 1 m~ C44           0
#>   5 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    to 1 m~ C50           0
#>   6 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    to 1 m~ C54           0
#>   7 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    to 1 m~ C64           0
#>   8 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    to 1 m~ C80           0
#>   9 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.0833~ C14           0
#>  10 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.0833~ C18           0
#>  11 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.0833~ C34           0
#>  12 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.0833~ C44           0
#>  13 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.0833~ C50           0
#>  14 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.0833~ C54           0
#>  15 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.0833~ C64           0
#>  16 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.0833~ C80           0
#>  17 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.167-~ C14           0
#>  18 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.167-~ C18           0
#>  19 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.167-~ C34           0
#>  20 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.167-~ C44           0
#>  21 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.167-~ C50           0
#>  22 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.167-~ C54           0
#>  23 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.167-~ C64           0
#>  24 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    0.167-~ C80           0
#>  25 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    1-5 ye~ C14           0
#>  26 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    1-5 ye~ C18           0
#>  27 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    1-5 ye~ C34           0
#>  28 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    1-5 ye~ C44           0
#>  29 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    1-5 ye~ C50           0
#>  30 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    1-5 ye~ C54           0
#>  31 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    1-5 ye~ C64           0
#>  32 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    1-5 ye~ C80           0
#>  33 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    5-10 y~ C14           0
#>  34 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    5-10 y~ C18           0
#>  35 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    5-10 y~ C34           1
#>  36 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    5-10 y~ C44           0
#>  37 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    5-10 y~ C50           0
#>  38 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    5-10 y~ C54           0
#>  39 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    5-10 y~ C64           0
#>  40 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    5-10 y~ C80           0
#>  41 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    Total ~ C14           0
#>  42 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    Total ~ C18           0
#>  43 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    Total ~ C34           1
#>  44 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    Total ~ C44           0
#>  45 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    Total ~ C50           0
#>  46 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    Total ~ C54           0
#>  47 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    Total ~ C64           0
#>  48 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall    Total ~ C80           0
#>  49 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      to 1 m~ C14           0
#>  50 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      to 1 m~ C18           0
#>  51 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      to 1 m~ C34           0
#>  52 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      to 1 m~ C44           0
#>  53 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      to 1 m~ C50           0
#>  54 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      to 1 m~ C54           0
#>  55 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      to 1 m~ C64           0
#>  56 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      to 1 m~ C80           0
#>  57 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.0833~ C14           0
#>  58 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.0833~ C18           0
#>  59 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.0833~ C34           0
#>  60 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.0833~ C44           0
#>  61 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.0833~ C50           0
#>  62 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.0833~ C54           0
#>  63 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.0833~ C64           0
#>  64 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.0833~ C80           0
#>  65 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.167-~ C14           0
#>  66 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.167-~ C18           0
#>  67 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.167-~ C34           0
#>  68 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.167-~ C44           0
#>  69 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.167-~ C50           0
#>  70 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.167-~ C54           0
#>  71 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.167-~ C64           0
#>  72 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      0.167-~ C80           0
#>  73 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      1-5 ye~ C14           0
#>  74 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      1-5 ye~ C18           0
#>  75 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      1-5 ye~ C34           0
#>  76 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      1-5 ye~ C44           0
#>  77 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      1-5 ye~ C50           0
#>  78 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      1-5 ye~ C54           0
#>  79 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      1-5 ye~ C64           0
#>  80 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      1-5 ye~ C80           0
#>  81 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      5-10 y~ C14           0
#>  82 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      5-10 y~ C18           0
#>  83 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      5-10 y~ C34           1
#>  84 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      5-10 y~ C44           0
#>  85 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      5-10 y~ C50           0
#>  86 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      5-10 y~ C54           0
#>  87 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      5-10 y~ C64           0
#>  88 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      5-10 y~ C80           0
#>  89 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      Total ~ C14           0
#>  90 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      Total ~ C18           0
#>  91 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      Total ~ C34           1
#>  92 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      Total ~ C44           0
#>  93 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      Total ~ C50           0
#>  94 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      Total ~ C54           0
#>  95 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      Total ~ C64           0
#>  96 00 - ~ SEER ~ Fema~ Black 1990~ race.1    Black      Total ~ C80           0
#>  97 00 - ~ SEER ~ Fema~ Black 1990~ t_dco.1   histology  to 1 m~ C14           0
#>  98 00 - ~ SEER ~ Fema~ Black 1990~ t_dco.1   histology  to 1 m~ C18           0
#>  99 00 - ~ SEER ~ Fema~ Black 1990~ t_dco.1   histology  to 1 m~ C34           0
#> 100 00 - ~ SEER ~ Fema~ Black 1990~ t_dco.1   histology  to 1 m~ C44           0
#> # ... with 421,196 more rows, and 12 more variables: expected <dbl>, sir <dbl>,
#> #   sir_lci <dbl>, sir_uci <dbl>, pyar <dbl>, n_base <dbl>,
#> #   ref_inc_cases <dbl>, ref_population_pyar <dbl>, ref_inc_crude_rate <dbl>,
#> #   fu_time_sort <int>, yvar_sort <int>, warning <chr>

Step 9 - Summarize SIR results

#The summarize function is versatile. Her for example the summary by

sircalc_results %>%
  #summarize results across region, age, year and t_site
  summarize_sir_results(.,
                        summarize_groups = c("region", "age", "year", "race"),
                        summarize_site = TRUE,
                        output = "long",  output_information = "minimal",
                        add_total_row = "only",  add_total_fu = "no",
                        collapse_ci = FALSE,  shorten_total_cols = TRUE,
                        fubreak_var_name = "fu_time", ybreak_var_name = "yvar_name",
                        xbreak_var_name = "none", site_var_name = "t_site",
                        alpha = 0.05
                        ) %>%
  dplyr::select(-region, -age, -year, -race, -sex, -yvar_name)
#> Warning: The results file `sir_df` contains observed cases in i_observed that do not occur in the refrates_df (ref_inc_cases).
#> Therefore calculation of the variables n_base and ref_population_pyar is ambiguous.
#> We take the first value of each variable. Expect small inconsistencies in the calculation of n_base, ref_population_pyar and ref_inc_crude_rate across strata.
#> ! If you want to know more, please check the `warnings` column of `sir_df`.
#> # A tidytable: 7 x 8
#>   yvar_label fu_time          fu_time_sort t_site observed expected   sir sir_ci
#>   <chr>      <chr>                   <int> <chr>     <dbl>    <dbl> <dbl> <chr> 
#> 1 Overall    to 1 month                  1 Total       327     20.6 15.9  14.23~
#> 2 Overall    0.0833-0.167 ye~            2 Total        80     20.4  3.92 3.11 ~
#> 3 Overall    0.167-1 years               3 Total       724    196.   3.69 3.43 ~
#> 4 Overall    1-5 years                   4 Total      2998    760.   3.95 3.81 ~
#> 5 Overall    5-10 years                  5 Total      3089    605.   5.1  4.92 ~
#> 6 Overall    10+ years                   6 Total      4241    500.   8.49 8.23 ~
#> 7 Overall    Total 0 to Inf ~            7 Total     11459   2102.   5.45 5.35 ~

Built with

sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=C                    LC_CTYPE=German_Germany.1252   
#> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
#> [5] LC_TIME=German_Germany.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] msSPChelpR_0.9.0 magrittr_2.0.3   dplyr_1.0.9     
#> 
#> loaded via a namespace (and not attached):
#>  [1] pillar_1.7.0      bslib_0.3.1       compiler_4.1.3    jquerylib_0.1.4  
#>  [5] prettyunits_1.1.1 progress_1.2.2    forcats_0.5.1     tools_4.1.3      
#>  [9] digest_0.6.29     jsonlite_1.8.0    lubridate_1.8.0   evaluate_0.15    
#> [13] lifecycle_1.0.1   tibble_3.1.7      pkgconfig_2.0.3   rlang_1.0.2      
#> [17] cli_3.3.0         DBI_1.1.2         rstudioapi_0.13   yaml_2.3.5       
#> [21] haven_2.5.0       xfun_0.31         fastmap_1.1.0     stringr_1.4.0    
#> [25] knitr_1.39        hms_1.1.1         generics_0.1.2    vctrs_0.4.1      
#> [29] sass_0.4.1        sjlabelled_1.2.0  tidyselect_1.1.2  data.table_1.14.2
#> [33] snakecase_0.11.0  glue_1.6.2        R6_2.5.1          fansi_1.0.3      
#> [37] rmarkdown_2.14    purrr_0.3.4       tidyr_1.2.0       ellipsis_0.3.2   
#> [41] htmltools_0.5.2   insight_0.17.1    assertthat_0.2.1  tidytable_0.8.0  
#> [45] utf8_1.2.2        stringi_1.7.6     crayon_1.5.1