This vignette explains how to use the functions:
calc_futime()
to calculate follow-up time from index
event until next event, death or end of follow-up datepat_status()
to determine patient status at end of
follow-uprenumber_time_id()
to calculate a consecutive index of
events per case IDreshape_long()
to transpose dataset in wide format to
data in long formatreshape_wide()
to transpose dataset in long format to
data in wide format (the wide format is required for many package
functions)sir_byfutime()
to calculate standardized incidence
ratios (SIRs) with custom grouping variables stratified by follow-up
timesummarize_sir_results()
to summarize detailed SIR
results produced by sir_byfutime()
vital_status()
to determine vital status whether
patient is alive or dead at end of follow-upFor some functions there are multiple variants of the same function using varying frameworks. They give the same results but will differ in execution time and memory use:
It is recommended to run the following steps in the correct order to obtain accurate follow-up time calculations
Filter all cases in the long version of the dataset that are relevant for your analysis. Make sure that:
case_id
the index event (e.g. First Cancer FC)
is still included and is the one remaining row in the dataset with the
smallest case_id
(TUMID3
variable for ZfKD
data, and SEQ_NUM
for SEER data)case_id
s might or might not get a countable
incident event (e.g. Second Primary Cancer SPC). This event should be
the second entry per case_id
(second smallest
case_id
) if it is to be countedcount_var
should indicate
whether the countable incident event (SPC) has occurred or not. Coded
0
for non-occurrence (or not counted event) and
1
for a counted incident event.Renumber filtered long dataset: In
the filter long dataset, you should run the helper function
msSPChelpR::renumber_time_id_dt()
(or non-data.table
variant msSPChelpR::renumber_time_id()
) that will renumber
all events per case_id
and (if step 1 is fulfilled) will
assign each index event with time_var_new = 1
and each
second (possibly countable incident event) with
time_var_new = 2
. Any SIR related function will only count
the second event, if additionally to time_var_new = 2
for
this row also count_var = 1
is true.
Reshape dataset: Run
msSPChelpR::reshape_wide_dt()
or non-data.table-variant
msSPChelpR::reshape_wide()
, so that dataset is transposed
to wide format (1 row per case_id
, creating variables such
as count_var.2
).
Set flag for Second Primary Cancer
diagnosis: After filtering and reshaping it is essential to set
p_spc
again. This variable will be used by later steps of
the analysis.
Determine patient status at a
defined end of follow-up by using the
msSPChelpR::pat_status()
function. This date for end of
follow-up must:
be in “YYYY-MM-DD” format and is always defined via the
fu_end =
parameter
must precede the end of data collection. E.g. if the last
incident events for the dataset you are using are collected at the end
of 2014, your fu_end
must be
fu_end = "2014-12-15"
or earlier.
Based on the newly calculated patient status, you might want to exclude cases for which patient status cannot be determined
msSPChelpR::calc_futime()
function and the same fu_end
as for step 6. By standard all
functions of the msSPChelpR
package require follow-up times
as numeric years.In order to calculate SIR using the package functions, the following
data structure is needed: * Wide format data wide_df
with
one row per patient that has encountered the index event (i.e. diagnosed
with a first primary cancer FC)
wide_df
needs to contain the following
variables (columns) per patient (row):
region_var
- variable in df that contains information
on region where case was incident.agegroup_var
- variable in df that contains information
on age-group.sex_var
- variable in df that contains information on
biological sex.year_var
- variable in df that contains information on
year or year-period when case was incident.site_var
- variable in df that contains information on
case (count event) diagnosis. Cases are usually the second cancers.
Diagnoses can use any coding system (e.g. ICD) but coding system between
dataset and reference data must be coherent.futime_var
- variable in df that contains follow-up
time per person between date of first cancer and any of death, date of
event (case), end of FU date (in years; whatever event comes first). In
case you have not calculated the FU time yet, you can use the workflow
described in the previous chapter.If your data has the required structure, you can calculate and summarize SIRs with the following two steps:
msSPChelpR::sir_byfutime()
function. For this calculation
usually a reference dataset is required that defines the population
standard rates. refrates_df
must use the same category
coding of age, sex, region, year and cancer_site as
agegroup_var
, sex_var
,
region_var
, year_var
and
site_var
msSPChelpR::summarize_sir_results()
function on the
stratified sir results produced by the previous step.In the next version of this vignette the theoretical considerations how SIRs are calculated will be explained in this chapter.
library(dplyr)
#>
#> Attache Paket: 'dplyr'
#> Die folgenden Objekte sind maskiert von 'package:stats':
#>
#> filter, lag
#> Die folgenden Objekte sind maskiert von 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
library(msSPChelpR)
#Load synthetic dataset of patients with cancer to demonstrate package functions
data("us_second_cancer")
#This dataset is in long format, so each tumor is a separate row in the data
us_second_cancer#> # A tibble: 113,999 x 15
#> fake_id SEQ_NUM registry sex race datebirth t_datediag t_site_icd t_dco
#> <chr> <int> <chr> <chr> <chr> <date> <date> <chr> <chr>
#> 1 100004 1 SEER Reg ~ Male White 1926-01-01 1992-07-15 C50 hist~
#> 2 100004 2 SEER Reg ~ Male White 1926-01-01 2004-01-15 C54 hist~
#> 3 100004 3 SEER Reg ~ Male White 1926-01-01 2006-06-15 C34 hist~
#> 4 100004 4 SEER Reg ~ Male White 1926-01-01 2018-06-15 C14 DCO ~
#> 5 100034 1 SEER Reg ~ Male White 1979-01-01 2000-06-15 C50 hist~
#> 6 100037 1 SEER Reg ~ Fema~ White 1938-01-01 1996-01-15 C54 hist~
#> 7 100038 1 SEER Reg ~ Male White 1989-01-01 1991-04-15 C50 hist~
#> 8 100038 2 SEER Reg ~ Male White 1989-01-01 2000-03-15 C80 hist~
#> 9 100039 1 SEER Reg ~ Fema~ White 1946-01-01 2003-08-15 C50 hist~
#> 10 100039 2 SEER Reg ~ Fema~ White 1946-01-01 2011-04-15 C34 hist~
#> # ... with 113,989 more rows, and 6 more variables: fc_age <int>,
#> # datedeath <date>, p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>,
#> # t_yeardiag <chr>
#filter for lung cancer
<- us_second_cancer %>%
ids #detect ids with any lung cancer
filter(t_site_icd == "C34") %>%
select(fake_id) %>%
as.vector() %>%
unname() %>%
unlist()
<- us_second_cancer %>%
filtered_usdata #filter according to above detected ids with any lung cancer diagnosis
filter(fake_id %in% ids) %>%
arrange(fake_id)
filtered_usdata#> # A tibble: 62,661 x 15
#> fake_id SEQ_NUM registry sex race datebirth t_datediag t_site_icd t_dco
#> <chr> <int> <chr> <chr> <chr> <date> <date> <chr> <chr>
#> 1 100004 1 SEER Reg ~ Male White 1926-01-01 1992-07-15 C50 hist~
#> 2 100004 2 SEER Reg ~ Male White 1926-01-01 2004-01-15 C54 hist~
#> 3 100004 3 SEER Reg ~ Male White 1926-01-01 2006-06-15 C34 hist~
#> 4 100004 4 SEER Reg ~ Male White 1926-01-01 2018-06-15 C14 DCO ~
#> 5 100039 1 SEER Reg ~ Fema~ White 1946-01-01 2003-08-15 C50 hist~
#> 6 100039 2 SEER Reg ~ Fema~ White 1946-01-01 2011-04-15 C34 hist~
#> 7 100039 3 SEER Reg ~ Fema~ White 1946-01-01 2018-01-15 C80 hist~
#> 8 100073 1 SEER Reg ~ Male White 1960-01-01 1993-11-15 C44 hist~
#> 9 100073 2 SEER Reg ~ Male White 1960-01-01 2003-12-15 C34 hist~
#> 10 100143 1 SEER Reg ~ Male White 1944-01-01 1992-03-15 C50 hist~
#> # ... with 62,651 more rows, and 6 more variables: fc_age <int>,
#> # datedeath <date>, p_alive <chr>, p_dodmin <date>, fc_agegroup <chr>,
#> # t_yeardiag <chr>
time_id
<- filtered_usdata %>%
renumbered_usdata renumber_time_id(new_time_id_var = "t_tumid",
dattype = "seer",
case_id_var = "fake_id")
%>%
renumbered_usdata select(fake_id, sex, t_site_icd, t_datediag, t_tumid)
#> # A tibble: 62,661 x 5
#> fake_id sex t_site_icd t_datediag t_tumid
#> <chr> <chr> <chr> <date> <int>
#> 1 100004 Male C50 1992-07-15 1
#> 2 100004 Male C54 2004-01-15 2
#> 3 100004 Male C34 2006-06-15 3
#> 4 100004 Male C14 2018-06-15 4
#> 5 100039 Female C50 2003-08-15 1
#> 6 100039 Female C34 2011-04-15 2
#> 7 100039 Female C80 2018-01-15 3
#> 8 100073 Male C44 1993-11-15 1
#> 9 100073 Male C34 2003-12-15 2
#> 10 100143 Male C50 1992-03-15 1
#> # ... with 62,651 more rows
<- renumbered_usdata %>%
usdata_wide reshape_wide_tidyr(case_id_var = "fake_id", time_id_var = "t_tumid", timevar_max = 10)
#now the data is in the wide format as required by many package functions.
#This means, each case is a row and several tumors per case ID are
#add new columns to the data using the time_id as column name suffix.
usdata_wide#> # A tibble: 31,997 x 127
#> fake_id SEQ_NUM.1 registry.1 sex.1 race.1 datebirth.1 t_datediag.1
#> <chr> <int> <chr> <chr> <chr> <date> <date>
#> 1 100004 1 SEER Reg 20 - Detroi~ Male White 1926-01-01 1992-07-15
#> 2 100039 1 SEER Reg 02 - Connec~ Fema~ White 1946-01-01 2003-08-15
#> 3 100073 1 SEER Reg 01 - San Fr~ Male White 1960-01-01 1993-11-15
#> 4 100143 1 SEER Reg 02 - Connec~ Male White 1944-01-01 1992-03-15
#> 5 100182 1 SEER Reg 02 - Connec~ Male Other 1927-01-01 1991-09-15
#> 6 100197 1 SEER Reg 02 - Connec~ Fema~ White 1945-01-01 2012-06-15
#> 7 100208 1 SEER Reg 02 - Connec~ Male White 1970-01-01 2019-11-15
#> 8 100230 1 SEER Reg 01 - San Fr~ Male White 1947-01-01 1992-11-15
#> 9 100234 1 SEER Reg 01 - San Fr~ Male White 1988-01-01 2010-02-15
#> 10 100266 1 SEER Reg 01 - San Fr~ Fema~ White 1956-01-01 2010-07-15
#> # ... with 31,987 more rows, and 120 more variables: t_site_icd.1 <chr>,
#> # t_dco.1 <chr>, fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>,
#> # p_dodmin.1 <date>, fc_agegroup.1 <chr>, t_yeardiag.1 <chr>,
#> # SEQ_NUM.2 <int>, registry.2 <chr>, sex.2 <chr>, race.2 <chr>,
#> # datebirth.2 <date>, t_datediag.2 <date>, t_site_icd.2 <chr>, t_dco.2 <chr>,
#> # fc_age.2 <int>, datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>,
#> # fc_agegroup.2 <chr>, t_yeardiag.2 <chr>, SEQ_NUM.3 <int>, ...
p_spc
<- usdata_wide %>%
usdata_wide ::mutate(p_spc = dplyr::case_when(is.na(t_site_icd.2) ~ "No SPC",
dplyr!is.na(t_site_icd.2) ~ "SPC developed",
TRUE ~ NA_character_)) %>%
#create the same information as numeric variable count_spc
::mutate(count_spc = dplyr::case_when(is.na(t_site_icd.2) ~ 1,
dplyrTRUE ~ 0))
%>%
usdata_wide ::select(fake_id, sex.1, p_spc, count_spc, t_site_icd.1,
dplyr.1, t_site_icd.2, t_datediag.2)
t_datediag#> # A tibble: 31,997 x 8
#> fake_id sex.1 p_spc count_spc t_site_icd.1 t_datediag.1 t_site_icd.2
#> <chr> <chr> <chr> <dbl> <chr> <date> <chr>
#> 1 100004 Male SPC developed 0 C50 1992-07-15 C54
#> 2 100039 Female SPC developed 0 C50 2003-08-15 C34
#> 3 100073 Male SPC developed 0 C44 1993-11-15 C34
#> 4 100143 Male SPC developed 0 C50 1992-03-15 C34
#> 5 100182 Male SPC developed 0 C18 1991-09-15 C34
#> 6 100197 Female SPC developed 0 C34 2012-06-15 C50
#> 7 100208 Male No SPC 1 C34 2019-11-15 <NA>
#> 8 100230 Male SPC developed 0 C44 1992-11-15 C34
#> 9 100234 Male No SPC 1 C34 2010-02-15 <NA>
#> 10 100266 Female No SPC 1 C34 2010-07-15 <NA>
#> # ... with 31,987 more rows, and 1 more variable: t_datediag.2 <date>
<- usdata_wide %>%
usdata_wide pat_status(., fu_end = "2017-12-31", dattype = "seer",
status_var = "p_status", life_var = "p_alive.1",
spc_var = "p_spc", birthdat_var = "datebirth.1",
lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
use_lifedatmin = FALSE, check = TRUE,
as_labelled_factor = TRUE)
#> # A tibble: 10 x 3
#> p_alive.1 p_status n
#> <chr> <fct> <int>
#> 1 Alive Patient alive after FC (with or without following SPC after ~ 5940
#> 2 Alive Patient alive after SPC 11316
#> 3 Alive NA - Patient not born before end of FU 4
#> 4 Alive NA - Patient did not develop cancer before end of FU 849
#> 5 Dead Patient alive after FC (with or without following SPC after ~ 863
#> 6 Dead Patient alive after SPC 1360
#> 7 Dead Patient dead after FC 6208
#> 8 Dead Patient dead after SPC 5325
#> 9 Dead NA - Patient did not develop cancer before end of FU 68
#> 10 Dead NA - Patient date of death is missing 64
#> # A tibble: 7 x 2
#> p_status n
#> <fct> <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU) 6803
#> 2 Patient alive after SPC 12676
#> 3 Patient dead after FC 6208
#> 4 Patient dead after SPC 5325
#> 5 NA - Patient not born before end of FU 4
#> 6 NA - Patient did not develop cancer before end of FU 917
#> 7 NA - Patient date of death is missing 64
%>%
usdata_wide ::select(fake_id, p_status, p_alive.1, datedeath.1, t_site_icd.1, t_datediag.1,
dplyr.2, t_datediag.2)
t_site_icd#> # A tibble: 31,997 x 8
#> fake_id p_status p_alive.1 datedeath.1 t_site_icd.1 t_datediag.1 t_site_icd.2
#> <chr> <fct> <chr> <date> <chr> <date> <chr>
#> 1 100004 Patient~ Alive NA C50 1992-07-15 C54
#> 2 100039 Patient~ Alive NA C50 2003-08-15 C34
#> 3 100073 Patient~ Dead 2005-06-01 C44 1993-11-15 C34
#> 4 100143 Patient~ Alive NA C50 1992-03-15 C34
#> 5 100182 Patient~ Dead 2007-05-01 C18 1991-09-15 C34
#> 6 100197 Patient~ Alive NA C34 2012-06-15 C50
#> 7 100208 NA - Pa~ Alive NA C34 2019-11-15 <NA>
#> 8 100230 Patient~ Dead 2008-05-01 C44 1992-11-15 C34
#> 9 100234 Patient~ Dead 2015-07-01 C34 2010-02-15 <NA>
#> 10 100266 Patient~ Alive NA C34 2010-07-15 <NA>
#> # ... with 31,987 more rows, and 1 more variable: t_datediag.2 <date>
#alternatively, you can impute the date of death using lifedatmin_var
%>%
usdata_wide pat_status(., fu_end = "2017-12-31", dattype = "seer",
status_var = "p_status", life_var = "p_alive.1",
spc_var = "p_spc", birthdat_var = "datebirth.1",
lifedat_var = "datedeath.1", fcdat_var = "t_datediag.1",
spcdat_var = "t_datediag.2", life_stat_alive = "Alive",
life_stat_dead = "Dead", spc_stat_yes = "SPC developed",
spc_stat_no = "No SPC", lifedat_fu_end = "2019-12-31",
use_lifedatmin = TRUE, lifedatmin_var = "p_dodmin.1",
check = TRUE, as_labelled_factor = TRUE)
#> # A tibble: 9 x 3
#> p_alive.1 p_status n
#> <chr> <fct> <int>
#> 1 Alive Patient alive after FC (with or without following SPC after e~ 5940
#> 2 Alive Patient alive after SPC 11316
#> 3 Alive NA - Patient not born before end of FU 4
#> 4 Alive NA - Patient did not develop cancer before end of FU 849
#> 5 Dead Patient alive after FC (with or without following SPC after e~ 867
#> 6 Dead Patient alive after SPC 1361
#> 7 Dead Patient dead after FC 6230
#> 8 Dead Patient dead after SPC 5362
#> 9 Dead NA - Patient did not develop cancer before end of FU 68
#> # A tibble: 6 x 2
#> p_status n
#> <fct> <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU) 6807
#> 2 Patient alive after SPC 12677
#> 3 Patient dead after FC 6230
#> 4 Patient dead after SPC 5362
#> 5 NA - Patient not born before end of FU 4
#> 6 NA - Patient did not develop cancer before end of FU 917
#> # A tibble: 31,997 x 130
#> fake_id SEQ_NUM.1 registry.1 sex.1 race.1 datebirth.1 t_datediag.1
#> <chr> <int> <chr> <chr> <chr> <date> <date>
#> 1 100004 1 SEER Reg 20 - Detroi~ Male White 1926-01-01 1992-07-15
#> 2 100039 1 SEER Reg 02 - Connec~ Fema~ White 1946-01-01 2003-08-15
#> 3 100073 1 SEER Reg 01 - San Fr~ Male White 1960-01-01 1993-11-15
#> 4 100143 1 SEER Reg 02 - Connec~ Male White 1944-01-01 1992-03-15
#> 5 100182 1 SEER Reg 02 - Connec~ Male Other 1927-01-01 1991-09-15
#> 6 100197 1 SEER Reg 02 - Connec~ Fema~ White 1945-01-01 2012-06-15
#> 7 100208 1 SEER Reg 02 - Connec~ Male White 1970-01-01 2019-11-15
#> 8 100230 1 SEER Reg 01 - San Fr~ Male White 1947-01-01 1992-11-15
#> 9 100234 1 SEER Reg 01 - San Fr~ Male White 1988-01-01 2010-02-15
#> 10 100266 1 SEER Reg 01 - San Fr~ Fema~ White 1956-01-01 2010-07-15
#> # ... with 31,987 more rows, and 123 more variables: t_site_icd.1 <chr>,
#> # t_dco.1 <chr>, fc_age.1 <int>, datedeath.1 <date>, p_alive.1 <chr>,
#> # p_dodmin.1 <date>, fc_agegroup.1 <chr>, t_yeardiag.1 <chr>,
#> # SEQ_NUM.2 <int>, registry.2 <chr>, sex.2 <chr>, race.2 <chr>,
#> # datebirth.2 <date>, t_datediag.2 <date>, t_site_icd.2 <chr>, t_dco.2 <chr>,
#> # fc_age.2 <int>, datedeath.2 <date>, p_alive.2 <chr>, p_dodmin.2 <date>,
#> # fc_agegroup.2 <chr>, t_yeardiag.2 <chr>, SEQ_NUM.3 <int>, ...
<- usdata_wide %>%
usdata_wide ::filter(!p_status %in% c("NA - Patient not born before end of FU",
dplyr"NA - Patient did not develop cancer before end of FU",
"NA - Patient date of death is missing"))
%>%
usdata_wide ::count(p_status)
dplyr#> # A tibble: 4 x 2
#> p_status n
#> <fct> <int>
#> 1 Patient alive after FC (with or without following SPC after end of FU) 6803
#> 2 Patient alive after SPC 12676
#> 3 Patient dead after FC 6208
#> 4 Patient dead after SPC 5325
<- usdata_wide %>%
usdata_wide calc_futime(., futime_var_new = "p_futimeyrs", fu_end = "2017-12-31",
dattype = "seer", time_unit = "years",
lifedat_var = "datedeath.1",
fcdat_var = "t_datediag.1", spcdat_var = "t_datediag.2")
#> # A tibble: 4 x 5
#> p_status mean_futime min_futime max_futime median_futime
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 Patient alive after FC (with ~ 9.58 0.0438 27.0 8.29
#> 2 Patient alive after SPC 8.69 0 26.9 7.50
#> 3 Patient dead after FC 8.54 0 25.8 7.47
#> 4 Patient dead after SPC 6.33 0 26.5 5.08
%>%
usdata_wide ::select(fake_id, p_status, p_futimeyrs, p_alive.1, datedeath.1, t_datediag.1, t_datediag.2)
dplyr#> # A tibble: 31,012 x 7
#> fake_id p_status p_futimeyrs p_alive.1 datedeath.1 t_datediag.1 t_datediag.2
#> <chr> <fct> <dbl> <chr> <date> <date> <date>
#> 1 100004 Patient ~ 11.5 Alive NA 1992-07-15 2004-01-15
#> 2 100039 Patient ~ 7.67 Alive NA 2003-08-15 2011-04-15
#> 3 100073 Patient ~ 10.1 Dead 2005-06-01 1993-11-15 2003-12-15
#> 4 100143 Patient ~ 3.33 Alive NA 1992-03-15 1995-07-15
#> 5 100182 Patient ~ 7.08 Dead 2007-05-01 1991-09-15 1998-10-15
#> 6 100197 Patient ~ 4.83 Alive NA 2012-06-15 2017-04-15
#> 7 100230 Patient ~ 11.0 Dead 2008-05-01 1992-11-15 2003-11-15
#> 8 100234 Patient ~ 5.37 Dead 2015-07-01 2010-02-15 NA
#> 9 100266 Patient ~ 7.46 Alive NA 2010-07-15 NA
#> 10 100274 Patient ~ 2.38 Dead 2006-06-01 2004-01-15 NA
#> # ... with 31,002 more rows
<- usdata_wide %>%
sircalc_results sir_byfutime(
dattype = "seer",
ybreak_vars = c("race.1", "t_dco.1"),
xbreak_var = "none",
futime_breaks = c(0, 1/12, 2/12, 1, 5, 10, Inf),
count_var = "count_spc",
refrates_df = us_refrates_icd2,
calc_total_row = TRUE,
calc_total_fu = TRUE,
region_var = "registry.1",
age_var = "fc_agegroup.1",
sex_var = "sex.1",
year_var = "t_yeardiag.1",
race_var = "race.1",
site_var = "t_site_icd.1", #using grouping by second cancer incidence
futime_var = "p_futimeyrs",
alpha = 0.05)
#> [INFO Cases 0 PYARs] There are conflicts where strata with 0 follow-up time have data in observed.
#> i 40 strata are affected.
#> - This might be caused by cases where SPC occured at the same day as first cancer.
#> - You can check this by excluding all cases from wide_df, where date of first diagnosis is equal.
#> ! Check attribute `problems_not_empty` of results to see what strata are affected.
#>
#> [INFO Unexpected Cases] There are observed cases in the results file that do not occur in the refrates_df.
#> i 2682 strata are affected.
#> A possible explanation can be:
#> - DCO cases or
#> - diagnosis of second cancer occured in different time period than first cancer
#> ! Check attribute `notes_refcases` of results to see what strata are affected.
#>
%>% print(n = 100)
sircalc_results #> # A tidytable: 421,296 x 22
#> age region sex race year yvar_name yvar_label fu_time t_site observed
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall to 1 m~ C14 0
#> 2 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall to 1 m~ C18 0
#> 3 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall to 1 m~ C34 0
#> 4 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall to 1 m~ C44 0
#> 5 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall to 1 m~ C50 0
#> 6 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall to 1 m~ C54 0
#> 7 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall to 1 m~ C64 0
#> 8 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall to 1 m~ C80 0
#> 9 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.0833~ C14 0
#> 10 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.0833~ C18 0
#> 11 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.0833~ C34 0
#> 12 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.0833~ C44 0
#> 13 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.0833~ C50 0
#> 14 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.0833~ C54 0
#> 15 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.0833~ C64 0
#> 16 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.0833~ C80 0
#> 17 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.167-~ C14 0
#> 18 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.167-~ C18 0
#> 19 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.167-~ C34 0
#> 20 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.167-~ C44 0
#> 21 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.167-~ C50 0
#> 22 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.167-~ C54 0
#> 23 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.167-~ C64 0
#> 24 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 0.167-~ C80 0
#> 25 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 1-5 ye~ C14 0
#> 26 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 1-5 ye~ C18 0
#> 27 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 1-5 ye~ C34 0
#> 28 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 1-5 ye~ C44 0
#> 29 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 1-5 ye~ C50 0
#> 30 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 1-5 ye~ C54 0
#> 31 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 1-5 ye~ C64 0
#> 32 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 1-5 ye~ C80 0
#> 33 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 5-10 y~ C14 0
#> 34 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 5-10 y~ C18 0
#> 35 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 5-10 y~ C34 1
#> 36 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 5-10 y~ C44 0
#> 37 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 5-10 y~ C50 0
#> 38 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 5-10 y~ C54 0
#> 39 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 5-10 y~ C64 0
#> 40 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall 5-10 y~ C80 0
#> 41 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall Total ~ C14 0
#> 42 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall Total ~ C18 0
#> 43 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall Total ~ C34 1
#> 44 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall Total ~ C44 0
#> 45 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall Total ~ C50 0
#> 46 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall Total ~ C54 0
#> 47 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall Total ~ C64 0
#> 48 00 - ~ SEER ~ Fema~ Black 1990~ total_var Overall Total ~ C80 0
#> 49 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black to 1 m~ C14 0
#> 50 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black to 1 m~ C18 0
#> 51 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black to 1 m~ C34 0
#> 52 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black to 1 m~ C44 0
#> 53 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black to 1 m~ C50 0
#> 54 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black to 1 m~ C54 0
#> 55 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black to 1 m~ C64 0
#> 56 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black to 1 m~ C80 0
#> 57 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.0833~ C14 0
#> 58 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.0833~ C18 0
#> 59 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.0833~ C34 0
#> 60 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.0833~ C44 0
#> 61 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.0833~ C50 0
#> 62 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.0833~ C54 0
#> 63 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.0833~ C64 0
#> 64 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.0833~ C80 0
#> 65 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.167-~ C14 0
#> 66 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.167-~ C18 0
#> 67 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.167-~ C34 0
#> 68 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.167-~ C44 0
#> 69 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.167-~ C50 0
#> 70 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.167-~ C54 0
#> 71 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.167-~ C64 0
#> 72 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 0.167-~ C80 0
#> 73 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 1-5 ye~ C14 0
#> 74 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 1-5 ye~ C18 0
#> 75 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 1-5 ye~ C34 0
#> 76 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 1-5 ye~ C44 0
#> 77 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 1-5 ye~ C50 0
#> 78 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 1-5 ye~ C54 0
#> 79 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 1-5 ye~ C64 0
#> 80 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 1-5 ye~ C80 0
#> 81 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 5-10 y~ C14 0
#> 82 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 5-10 y~ C18 0
#> 83 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 5-10 y~ C34 1
#> 84 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 5-10 y~ C44 0
#> 85 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 5-10 y~ C50 0
#> 86 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 5-10 y~ C54 0
#> 87 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 5-10 y~ C64 0
#> 88 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black 5-10 y~ C80 0
#> 89 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black Total ~ C14 0
#> 90 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black Total ~ C18 0
#> 91 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black Total ~ C34 1
#> 92 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black Total ~ C44 0
#> 93 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black Total ~ C50 0
#> 94 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black Total ~ C54 0
#> 95 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black Total ~ C64 0
#> 96 00 - ~ SEER ~ Fema~ Black 1990~ race.1 Black Total ~ C80 0
#> 97 00 - ~ SEER ~ Fema~ Black 1990~ t_dco.1 histology to 1 m~ C14 0
#> 98 00 - ~ SEER ~ Fema~ Black 1990~ t_dco.1 histology to 1 m~ C18 0
#> 99 00 - ~ SEER ~ Fema~ Black 1990~ t_dco.1 histology to 1 m~ C34 0
#> 100 00 - ~ SEER ~ Fema~ Black 1990~ t_dco.1 histology to 1 m~ C44 0
#> # ... with 421,196 more rows, and 12 more variables: expected <dbl>, sir <dbl>,
#> # sir_lci <dbl>, sir_uci <dbl>, pyar <dbl>, n_base <dbl>,
#> # ref_inc_cases <dbl>, ref_population_pyar <dbl>, ref_inc_crude_rate <dbl>,
#> # fu_time_sort <int>, yvar_sort <int>, warning <chr>
#The summarize function is versatile. Her for example the summary by
%>%
sircalc_results #summarize results across region, age, year and t_site
summarize_sir_results(.,
summarize_groups = c("region", "age", "year", "race"),
summarize_site = TRUE,
output = "long", output_information = "minimal",
add_total_row = "only", add_total_fu = "no",
collapse_ci = FALSE, shorten_total_cols = TRUE,
fubreak_var_name = "fu_time", ybreak_var_name = "yvar_name",
xbreak_var_name = "none", site_var_name = "t_site",
alpha = 0.05
%>%
) ::select(-region, -age, -year, -race, -sex, -yvar_name)
dplyr#> Warning: The results file `sir_df` contains observed cases in i_observed that do not occur in the refrates_df (ref_inc_cases).
#> Therefore calculation of the variables n_base and ref_population_pyar is ambiguous.
#> We take the first value of each variable. Expect small inconsistencies in the calculation of n_base, ref_population_pyar and ref_inc_crude_rate across strata.
#> ! If you want to know more, please check the `warnings` column of `sir_df`.
#> # A tidytable: 7 x 8
#> yvar_label fu_time fu_time_sort t_site observed expected sir sir_ci
#> <chr> <chr> <int> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 Overall to 1 month 1 Total 327 20.6 15.9 14.23~
#> 2 Overall 0.0833-0.167 ye~ 2 Total 80 20.4 3.92 3.11 ~
#> 3 Overall 0.167-1 years 3 Total 724 196. 3.69 3.43 ~
#> 4 Overall 1-5 years 4 Total 2998 760. 3.95 3.81 ~
#> 5 Overall 5-10 years 5 Total 3089 605. 5.1 4.92 ~
#> 6 Overall 10+ years 6 Total 4241 500. 8.49 8.23 ~
#> 7 Overall Total 0 to Inf ~ 7 Total 11459 2102. 5.45 5.35 ~
sessionInfo()
#> R version 4.1.3 (2022-03-10)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19044)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=C LC_CTYPE=German_Germany.1252
#> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
#> [5] LC_TIME=German_Germany.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] msSPChelpR_0.9.0 magrittr_2.0.3 dplyr_1.0.9
#>
#> loaded via a namespace (and not attached):
#> [1] pillar_1.7.0 bslib_0.3.1 compiler_4.1.3 jquerylib_0.1.4
#> [5] prettyunits_1.1.1 progress_1.2.2 forcats_0.5.1 tools_4.1.3
#> [9] digest_0.6.29 jsonlite_1.8.0 lubridate_1.8.0 evaluate_0.15
#> [13] lifecycle_1.0.1 tibble_3.1.7 pkgconfig_2.0.3 rlang_1.0.2
#> [17] cli_3.3.0 DBI_1.1.2 rstudioapi_0.13 yaml_2.3.5
#> [21] haven_2.5.0 xfun_0.31 fastmap_1.1.0 stringr_1.4.0
#> [25] knitr_1.39 hms_1.1.1 generics_0.1.2 vctrs_0.4.1
#> [29] sass_0.4.1 sjlabelled_1.2.0 tidyselect_1.1.2 data.table_1.14.2
#> [33] snakecase_0.11.0 glue_1.6.2 R6_2.5.1 fansi_1.0.3
#> [37] rmarkdown_2.14 purrr_0.3.4 tidyr_1.2.0 ellipsis_0.3.2
#> [41] htmltools_0.5.2 insight_0.17.1 assertthat_0.2.1 tidytable_0.8.0
#> [45] utf8_1.2.2 stringi_1.7.6 crayon_1.5.1