README

casebase is an R package for fitting flexible and fully parametric hazard regression models to survival data with single event type or multiple competing causes via logistic and multinomial regression. Our formulation allows for arbitrary functional forms of time and its interactions with other predictors for time-dependent hazards and hazard ratios. From the fitted hazard model, we provide functions to readily calculate and plot cumulative incidence and survival curves for a given covariate profile. This approach accommodates any log-linear hazard function of prognostic time, treatment, and covariates, and readily allows for non-proportionality. We also provide a plot method for visualizing incidence density via population time plots.

Installation

install.packages("casebase")

# install.packages("devtools")
devtools::install_github("sahirbhatnagar/casebase")

Vignettes

useR! 2019 Toulouse - Presentation

Quickstart

This is a basic example which shows you some of the main functionalities of the casebase package. We use data from the estrogen plus progestin trial from the Women’s Health Initiative (included in the casebase package). This randomized clinical trial investigated the effect of estrogen plus progestin (estPro) on coronary heart disease (CHD) risk in 16,608 postmenopausal women who were 50 to 79 years of age at base line. Participants were randomly assigned to receive estPro or placebo. The primary efficacy outcome of the trial was CHD (nonfatal myocardial infarction or death due to CHD).

library(casebase)
#> See example usage at http://sahirbhatnagar.com/casebase/
library(visreg)
library(splines)
data("eprchd")

Population Time Plots

We first visualize the data with a population time plot. For each treatment arm, we plot the observed person time in gray, and the case series as colored dots. It gives us a good visual representation of the incidence density:

plot(popTime(eprchd, exposure = "treatment"))

Fit a Smooth Hazard Model

eprchd <- transform(eprchd, 
                    treatment = factor(treatment, levels = c("placebo","estPro")))

fit <- fitSmoothHazard(status ~ treatment*ns(time, df = 3),
                       data = eprchd,
                       time = "time")
summary(fit)
#> Fitting smooth hazards with case-base sampling
#> 
#> Sample size: 16608 
#> Number of events: 324 
#> Number of base moments: 32400 
#> ----
#> 
#> Call:
#> fitSmoothHazard(formula = status ~ treatment * ns(time, df = 3), 
#>     data = eprchd, time = "time")
#> 
#> Deviance Residuals: 
#>     Min       1Q   Median       3Q      Max  
#> -0.2441  -0.1474  -0.1368  -0.1272   3.1398  
#> 
#> Coefficients:
#>                                   Estimate Std. Error z value
#> (Intercept)                       -5.87406    0.30068 -19.536
#> treatmentestPro                    0.65528    0.37780   1.734
#> ns(time, df = 3)1                 -0.37685    0.36055  -1.045
#> ns(time, df = 3)2                  0.82888    0.73435   1.129
#> ns(time, df = 3)3                  1.33620    0.34430   3.881
#> treatmentestPro:ns(time, df = 3)1  0.01662    0.48792   0.034
#> treatmentestPro:ns(time, df = 3)2 -1.42088    0.94245  -1.508
#> treatmentestPro:ns(time, df = 3)3 -1.04687    0.48685  -2.150
#>                                   Pr(>|z|)    
#> (Intercept)                        < 2e-16 ***
#> treatmentestPro                   0.082832 .  
#> ns(time, df = 3)1                 0.295935    
#> ns(time, df = 3)2                 0.259015    
#> ns(time, df = 3)3                 0.000104 ***
#> treatmentestPro:ns(time, df = 3)1 0.972831    
#> treatmentestPro:ns(time, df = 3)2 0.131645    
#> treatmentestPro:ns(time, df = 3)3 0.031533 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 3635.4  on 32723  degrees of freedom
#> Residual deviance: 3615.6  on 32716  degrees of freedom
#> AIC: 3631.6
#> 
#> Number of Fisher Scoring iterations: 7

Since the output object from fitSmoothHazard inherits from the glm class, we see a familiar result when using the function summary.

Time-Dependent Hazard Function

The treatment effect on the hazard is somewhat difficult to interpret because of its interaction with the spline term on time. In these situations, it is often more instructive to visualize the relationship. For example, we can easily plot the hazard function for each treatment arm:

plot(fit, hazard.params = list(xvar = "time", by = "treatment"))
#> Conditions used in construction of plot
#> treatment: placebo / estPro
#> offset: 0

Time-Dependent Hazard Ratio

newtime <- quantile(eprchd$time, 
                    probs = seq(0.01, 0.99, 0.01))

# reference category
newdata <- data.frame(treatment = factor("placebo", 
                                         levels = c("placebo", "estPro")), 
                      time = newtime)

plot(fit, 
     type = "hr", 
     newdata = newdata,
     var = "treatment",
     increment = 1,
     xvar = "time",
     ci = T,
     rug = T)

Cumulative Incidence Function (CIF)

smooth_risk <- absoluteRisk(object = fit, 
                            newdata = data.frame(treatment = c("placebo", "estPro")))

plot(smooth_risk, id.names = c("placebo", "estPro"))

Class structure

The casebase package uses the following hierarchy of classes for the output of fitSmoothHazard:

The class singleEventCB is an S3 class, and we also keep track of the classes appearing below. The class CompRisk is an S4 class that inherits from vglm.

Credit

Citation

citation('casebase')
#> 
#> To cite casebase in publications use:
#> 
#> Bhatnagar S, Turgeon M, Islam J, Saarela O, Hanley J (2020).
#> _casebase: Fitting Flexible Smooth-in-Time Hazards and Risk
#> Functions via Logistic and Multinomial Regression_. R package
#> version 0.9.0, <URL:
#> https://CRAN.R-project.org/package=casebase>.
#> 
#>   Hanley, James A., and Olli S. Miettinen. Fitting
#>   smooth-in-time prognostic risk functions via logistic
#>   regression. International Journal of Biostatistics 5.1
#>   (2009): 1125-1125.
#> 
#>   Saarela, Olli. A case-base sampling method for estimating
#>   recurrent event intensities. Lifetime data analysis 22.4
#>   (2016): 589-605.
#> 
#> If competing risks analyis is used, please also cite
#> 
#>   Saarela, Olli, and Elja Arjas. Non-parametric Bayesian
#>   Hazard Regression for Chronic Disease Risk Assessment.
#>   Scandinavian Journal of Statistics 42.2 (2015): 609-626.
#> 
#> To see these entries in BibTeX format, use 'print(<citation>,
#> bibtex=TRUE)', 'toBibtex(.)', or set
#> 'options(citation.bibtex.max=999)'.

Contact

casebase