Badges Confounding; Logistic regression; Cox proportional hazards model; Linear regression;
The ‘chest’ package systematically calculates and compares effect estimates from various models with different combinations of variables. It calculates the changes in effect estimates when each variable is added to the model sequentially in a step-wise fashion. Effect estimates here can be regression coefficients, odds ratios and hazard ratios depending on modelling methods. At each step, only one variable that causes the largest change among the remaining variables is added to the model. The final results from many models are summarized in one graph and one data frame table. This approach can be used for assessing confounding effects in epidemiological studies and bio-medical research including clinical trials.
You can install the released version of chest from CRAN with:
install.packages("chest")
library(chest)
library(ggplot2)
names(diab_df)
#> [1] "Endpoint" "mid" "Diabetes" "Age" "Sex" "BMI"
#> [7] "Married" "Smoke" "CVD" "Cancer" "Education" "Income"
#> [13] "t0" "t1"
A data frame ‘diab_df’ is used to examine the association between Diabetes
and mortality Endpoint
. The purpose of using this data set is to demonstrate the use of the functions in this package rather than answering any research questions.
<- chest_speedglm(
results crude = "Endpoint ~ Diabetes",
xlist = c("Age", "Sex", "Married", "Smoke", "Cancer", "CVD","Education", "Income"),
data = diab_df)
$data
results#> variables est lb ub Change p n
#> 1 Crude 2.305786 1.809862 2.937600 NA 1.367426e-11 2048
#> 2 + Age 3.297099 2.421755 4.488837 42.99238373 3.498331e-14 2048
#> 3 + Income 2.902046 2.125534 3.962238 -11.98183756 2.001413e-11 2048
#> 4 + CVD 2.797882 2.044220 3.829405 -3.58931332 1.316760e-10 2048
#> 5 + Smoke 2.900600 2.113218 3.981361 3.67128106 4.388114e-11 2048
#> 6 + Sex 2.872333 2.092484 3.942824 -0.97453539 6.649405e-11 2048
#> 7 + Married 2.894543 2.107594 3.975330 0.77324730 5.185994e-11 2048
#> 8 + Cancer 2.881019 2.097209 3.957769 -0.46723321 6.520345e-11 2048
#> 9 + Education 2.878757 2.093150 3.959219 -0.07852074 7.880738e-11 2048
We can also use ‘chest_plot’ to show the results above as a ‘ggplot’ object.
<- chest_speedglm(
results crude = "Endpoint ~ Diabetes",
xlist = c("Age", "Sex", "Married", "Smoke", "Cancer", "CVD","Education", "Income"),
data = diab_df)
chest_plot(results)
We adjust the location of the values.
<- chest_speedglm(
results crude = "Endpoint ~ Diabetes",
xlist = c("Age", "Sex", "Married", "Smoke", "Cancer", "CVD","Education", "Income"),
data = diab_df)
<- chest_plot(results, nudge_y = 0, value_position = 5)
p + scale_x_continuous(breaks = c(0.5, 1:4), limits = c(0.5, 8)) p
Or remove the values.
<- chest_speedglm(
results crude = "Endpoint ~ Diabetes",
xlist = c("Age", "Sex", "Married", "Smoke", "Cancer", "CVD","Education", "Income"),
data = diab_df)
chest_plot(results, no_values = TRUE)
Users can also use ‘chest_forest’ to call ‘forestplot’ package.
<- chest_speedglm(
results crude = "Endpoint ~ Diabetes",
xlist = c("Age", "Sex", "Married", "Smoke", "Cancer", "CVD","Education", "Income"),
data = diab_df)
chest_forest(results)
All Odds ratios are for the association between Diabetes
and mortality Endpoint
after each of other factors added to the model sequentially.
Step 1: Started with a model: speedglm(endpoint ~ diabetes)
. The odds ratio for diabetes was presented in the row marked as Crude
.
Step 2: Each of 8 variables was separately added to the above model, and chest_speedglm
compared odds ratios from those eight models to identify the one which created the largest change. The variable Age
was selected and added to the model.
Step 3: Repeat Step 2 with the remaining 7 variables. The variable income
was selected and added.
Step 4 to Step 9 repeated the same procedure until all variables were added. We can see after adding age and income variables add other variables had little impact on the odds ratio estimates, and odds ratio estimates remained positive on the right hand side of non-effect line. In this case, ‘chest’ shows one table with the results after fitting 37 total models: 1 crude model plus 36 (8 + 7 + 6 +5 + 4 + 3 + 2 + 1) models.
<- c("Age", "Sex", "Married", "Smoke", "Cancer", "CVD","Education", "Income")
vlist <- chest_speedglm(
results crude = "Endpoint ~ Diabetes",
xlist = vlist, data = diab_df)
$data
results#> variables est lb ub Change p n
#> 1 Crude 2.305786 1.809862 2.937600 NA 1.367426e-11 2048
#> 2 + Age 3.297099 2.421755 4.488837 42.99238373 3.498331e-14 2048
#> 3 + Income 2.902046 2.125534 3.962238 -11.98183756 2.001413e-11 2048
#> 4 + CVD 2.797882 2.044220 3.829405 -3.58931332 1.316760e-10 2048
#> 5 + Smoke 2.900600 2.113218 3.981361 3.67128106 4.388114e-11 2048
#> 6 + Sex 2.872333 2.092484 3.942824 -0.97453539 6.649405e-11 2048
#> 7 + Married 2.894543 2.107594 3.975330 0.77324730 5.185994e-11 2048
#> 8 + Cancer 2.881019 2.097209 3.957769 -0.46723321 6.520345e-11 2048
#> 9 + Education 2.878757 2.093150 3.959219 -0.07852074 7.880738e-11 2048
chest_plot(results)
$Age_Sex <- diab_df$Age*diab_df$Sex
diab_df$Age2 = diab_df$Age^2
diab_df<-c("Age", "Sex", "Age2", "Age_Sex", "Married", "Cancer", "CVD", "Education", "Income")
vlist_1<- chest_speedglm(crude = "Endpoint ~ Diabetes", xlist = vlist_1, data = diab_df) results
chest_plot(results)
chest_forest(results)
‘chest_glm’ is slower than ‘chest_speedglm’. We can use indicate = TRUE
to monitor the progress. If it is too slow, you may want to try ‘chest_speedglm’.
<- c("Age", "Sex", "Married", "Smoke", "Education")
vlist <- chest_glm(crude = "Endpoint ~ Diabetes", xlist = vlist,
results data = diab_df, indicate = TRUE)
<- chest_cox(crude = "Surv(t0, t1, Endpoint) ~ Diabetes",
results xlist = vlist, data = diab_df)
chest_plot(results)
chest_forest(results)
<- chest_clogit(crude = "Endpoint ~ Diabetes + strata(mid)",
results xlist = vlist, data = diab_df)
<-c("Age", "Sex", "Married", "Cancer", "CVD","Education", "Income")
vlist<- chest_lm(crude = "BMI ~ Diabetes", xlist = vlist, data = diab_df)
results chest_plot(results)