The purpose of this vignette is to provide users with a step-by-step guide for performing and interpreting the results of a time varying mediation analysis with two treatment groups and a binary outcome. Please note, that this package has been built considering the structure of a panel data, where each subject/participant has repeated responses (collected over time) for the outcome and mediator variables. However, we do not address dynamic treatment regimens in this package. Therefore, we assume the scenario where the treatment (exposure) is time-invariant (i.e., does not change over time).
For illustration, we rely on an example dataset, simulated based on
the Wisconsin Smokers’ Health Study 2 data (Baker et. al., 2016) which
includes 1086 individuals assigned to one of three treatment conditions.
One-third of participants received only a nicotine patch
;
another one-third received varenicline
, and the final third
of participants received a
combination nicotine replacement therapy (NRT)
which
included nicotine patch + nicotine mini-lozenge. The outcome of interest
is daily smoking status
, a binary variable derived from the
number of cigarettes smoked by the participants each day. If
participants smoked any cigarettes that day, then
daily smoking status = 1
, otherwise it equals 0. In
addition, mediator variables were measured by asking participants if
they felt a negative mood in the last fifteen minutes
,
whether they wanted to smoke in the last 15 minutes
, and
how tired they felt of trying to quit smoking (i.e.,
cessation fatigue
), all recorded on a 7-point Likert scale.
Both the outcome and mediator variables were assessed two times per day
for the first two weeks post-quit day (rendering 30 time points of
response since assessments start on day 0 post target quit day), and
every other day (2x per day) for weeks 3 and 4 (rendering 14 time points
of response).
A traditional approach to analyzing this type of data would be to use mediation analysis in which the effects are assumed to not vary as a function of time. First, a single (i.e., time-invariant) direct effect would be estimated by regressing the outcome on the treatment condition and mediator. Next, a time-invariant indirect effect would be computed by multiplying the effect of treatment condition on the mediator by the effect of the mediator on the outcome. However, this method potentially misses important information about the dynamic effect that a mediator may have over time. Specifically, we hypothesize that mood changes across and within days and thus, its mediating effect on one’s success of quitting smoking is likely to vary over time. We therefore propose a time varying mediation analysis which estimates the mediation effect as a function that varies over time.
The tvmb
function, like tvma
, was developed
for estimating the mediation effect of two treatment (exposure)
groups.
To use the time varying mediation analysis package in R, you must
first install the package and load it. Before that, make sure you have
R version 4.0.3
or greater. There are two ways to install
the package from the CRAN (Comprehensive R Archive Network) repository,
by using install.packages
or the devtools
function.
install.packages("tvmediation", dependencies = TRUE)
library(tvmediation)
The equivalent code using devtools
is:
::install_cran("tvmediation", dependencies = TRUE)
devtoolslibrary(tvmediation)
If you do not have devtools
installed and loaded, you
can do so using the following code:
install.packages("devtools", dependencies = TRUE)
library(devtools)
Alternatively, if you want to install the package directly from the GitHub repository to access new or revised functions in development, use the following code:
::install_github("dcoffman/tvmediation", dependencies = TRUE)
devtoolslibrary(tvmediation)
tvmb
functionOnce installed, you can type ?tvmediation
in the console
to view the package documentation, as well as links to the important
functions and data included in the package. The time-varying mediation
analysis for the binary outcome and 2 exposure groups, relies on 2 user
functions tvmb
and LongToWide
as well as a
number of internal functions of the tvmediation
package.
The tvmb
function has four required and five optional
arguments.
treatment
A binary vector indicating the treatment
groupt.seq
A numeric vector of the time sequence of the
measuresmediator
The matrix of mediator values in wide
formatoutcome
The matrix of outcome values in wide
formatThe optional arguments are:
span
Numeric value of the span to be used for LOESS
regression. Default = 0.75.plot
TRUE or FALSE for plotting the mediation effect.
The default value is “FALSE”.CI
“none” or “boot” method of deriving confidence
intervals (CIs). The default value is “boot”.replicates
Number of replicates for bootstrapping CIs.
The default value is 1000.verbose
TRUE or FALSE for printing results to screen.
The default value is “FALSE”.Note that, unlike the tvma
and tvma_3trt
functions, the time points at which mediation effects are estimated
cannot be different than the actual recorded response time points
t.seq
. Thus, tvma
and tvma_3trt
have the provision of a different t.est
than
t.seq
, but tvmb
does not have that
provision.
The dataset we will use for our illustration is named
smoker
and is included in the package.
To load the simulated dataset smoker.rda
, type:
data(smoker)
The smoker
data frame is organized in long
format with SubjectID
repeating over multiple rows
for each participant. The tvmb
function requires that the
data be in wide format to estimate the time varying
coefficients. The tvmediation
package includes a useful
function LongToWide
to help users properly format their
data for analysis.
LongToWide
has three required arguments and a fourth
optional argument.
subject.id
specifies the column of subject
identifiers.time.sequences
specifies the column of measurement
times.outcome
specifies the column for the variable (either
the outcome or the mediator) that will be transposed.verbose
is an optional argument that if “TRUE” prints
the output of LongToWide
to the console. The default value
is “FALSE”.The output of LongToWide
is a matrix of data in wide
format where columns represent the subjects and rows represent the time
sequence. Thus, each cell contains the j-th subject’s response at the
i-th time point.
The tvmb
function requires two matrices, one for the
mediator, and one for the outcome. Thus, we use the
LongToWide
function twice as illustrated below:
<- LongToWide(smoker$SubjectID, smoker$timeseq, smoker$NegMoodLst15min)
mediator <- LongToWide(smoker$SubjectID, smoker$timeseq, smoker$smoke_status) outcome
class(mediator)
## [1] "matrix" "array"
1:16, 1:10] mediator[
## 1 2 3 4 5 6 7 8 9 10
## 0 7 1 1 1 1 NA 4 1 1 3
## 0.5 2 1 1 6 1 NA NA 1 1 1
## 1 1 1 2 NA NA 1 3 1 1 1
## 1.5 1 1 1 1 1 1 2 1 NA 1
## 2 1 1 1 1 NA 1 1 1 1 1
## 2.5 3 1 1 1 1 3 1 1 1 1
## 3 1 6 NA 1 1 1 1 2 NA 1
## 3.5 1 1 2 6 3 7 1 6 1 1
## 4 1 1 1 1 1 1 1 1 2 4
## 4.5 1 1 1 5 1 5 2 NA 1 3
## 5 4 1 2 1 NA 5 2 1 1 1
## 5.5 1 1 2 1 1 2 NA 1 1 2
## 6 2 1 1 4 1 7 1 NA 2 3
## 6.5 1 1 1 1 1 1 1 5 3 1
## 7 1 2 1 NA 1 1 NA 1 1 4
## 7.5 2 2 1 1 1 NA 1 1 1 6
class(outcome)
## [1] "matrix" "array"
1:16, 1:10] outcome[
## 1 2 3 4 5 6 7 8 9 10
## 0 0 0 0 0 0 NA 0 1 1 0
## 0.5 1 0 0 1 1 NA NA 1 1 1
## 1 0 1 0 NA NA 0 0 0 0 0
## 1.5 1 1 1 0 0 0 0 0 NA 0
## 2 0 0 0 0 NA 1 0 0 0 0
## 2.5 0 0 0 0 0 1 1 0 0 1
## 3 0 0 NA 0 0 0 0 0 NA 0
## 3.5 0 1 0 0 0 1 1 1 1 1
## 4 0 0 0 0 0 1 0 1 0 0
## 4.5 1 1 0 0 0 1 1 NA 1 0
## 5 1 0 0 0 NA 0 0 0 0 0
## 5.5 1 0 0 1 0 1 NA 0 0 0
## 6 0 0 0 1 0 0 0 NA 0 1
## 6.5 0 0 0 1 0 1 1 1 0 1
## 7 0 0 0 NA 0 0 NA 0 0 0
## 7.5 0 1 0 0 1 NA 0 1 0 0
If your data are already in wide format, there is no need to use the
LongToWide
function and you can simply subset your dataset.
However, mediator
and outcome
must be of class
matrix
; hence make sure you convert the class of the
subsetted mediator
and outcome
objects to
matrix
before proceeding. This can be done using the R
function as.matrix
.
The tvmb
function requires two more variables that we
have not yet created:
treatment
A binary numeric vector indicating the
treatment groupt.seq
A numeric vector of the time sequence of the
measuresIf there are two treatment groups, create a treatment
variable with the following assignment: 1
for the
comparator group and 0
for the placebo group. If there are
three treatment groups, two dummy variables clearly indicating
treatment1
, treatment2
and
placebo
need to be created. An example is given as follows.
In the smoker.rda
dataset, three columns indicating the
assignment of the three treatments are present.
Creating two dummy variables to indicate whether a participant was
given varenicline
or combination NRT
.
NRT1
indicates whether a subject received
varenicline
or not. NRT2
indicates whether a
subject received combination NRT
or not. Each subject’s
response for the treatment group (e.g., varenicline
) was
converted to a numeric value, and 1 was subtracted to yield a vector of
zeros and ones.
# Step 1: Since each subject has multiple rows of data, extract the unique response of each subject to receiving varenicline. The data is still in dataframe format.
<- unique(smoker[ , c("SubjectID","varenicline")])
trtv 1:10,] trtv[
## SubjectID varenicline
## 1 1 No
## 2 2 Yes
## 3 3 Yes
## 4 4 No
## 5 5 Yes
## 6 6 Yes
## 7 7 Yes
## 8 8 Yes
## 9 9 Yes
## 10 10 Yes
# Step 2: `2` to those subjects who received varenicline and `1` to the rest. The data is now in vector format.
<- as.numeric(trtv[ , 2])
trtv2 1:10] trtv2[
## [1] 2 1 1 2 1 1 1 1 1 1
# Step 3: subtract 1 from these numeric responses and procure a vector of zeros and ones
<- trtv2 -1
NRT1 1:10] NRT1[
## [1] 1 0 0 1 0 0 0 0 0 0
# Step 1: Since each subject has multiple rows of data, extract the unique response of each subject to receiving combination NRT. The data is still in dataframe format.
<- unique(smoker[ , c("SubjectID","comboNRT")])
trtc 1:10,] trtc[
## SubjectID comboNRT
## 1 1 Yes
## 2 2 No
## 3 3 Yes
## 4 4 Yes
## 5 5 No
## 6 6 Yes
## 7 7 Yes
## 8 8 Yes
## 9 9 Yes
## 10 10 Yes
# Step 2: `2` to those subjects who received combination NRT and `1` to the rest. The data is now in vector format.
<- as.numeric(trtc[ , 2])
trtc2 1:10] trtc2[
## [1] 1 2 1 1 2 1 1 1 1 1
# Step 3: subtract 1 from these numeric responses and procure a vector of zeros and ones
<- trtc2 -1
NRT2 1:10] NRT2[
## [1] 0 1 0 0 1 0 0 0 0 0
This steps can be alternatively collated into a single step and written as follows:
<- as.numeric(unique(smoker[ ,c("SubjectID","varenicline")])[,2])-1
NRT1 <- as.numeric(unique(smoker[ ,c("SubjectID","comboNRT")])[,2])-1
NRT2 1:10] NRT1[
## [1] 1 0 0 1 0 0 0 0 0 0
1:10] NRT2[
## [1] 0 1 0 0 1 0 0 0 0 0
Depending on our exposure of interest, i.e. whether we want to
compare the mediation effects observed among participants with
varenicline
vs. placebo (nicotine patch)
, or
combination NRT
vs. placebo
, or
varenicline
vs. combination NRT
, a final
variable was derived based on these two variables.
For example: The following codes are to derive the final variable to
compare the mediation effects observed among participants with
varenicline
vs. placebo (nicotine patch)
.
<- rep(NA, 1086) # replace 1086 with the number of subjects in your study #
treatment <- ifelse(NRT1==1 & NRT2==0, 1, ifelse(NRT1==0 & NRT2==0, 0, NA))
treatment 1:10] treatment[
## [1] 1 NA 0 1 NA 0 0 0 0 0
The current version of the tvmb
function only supports
two treatment options. Therefore, in order to estimate the mediation
effects observed among participants with combination NRT
vs. placebo
, the above code would be changed to,
# treatment <- rep(NA, 1086)
# treatment <- ifelse(NRT1==0 & NRT2==1, 1, ifelse(NRT1==0 & NRT2==0, 0, NA))
In case the user is interested in the mediation effects observed
among participants with varenicline
vs. not varenicline
or combination NRT
vs. not combination NRT1
where the comparator group is a
collective of the remaining two treatment groups, refer to the
tvma
function vignette for guidance.
To generate t.seq
we found the unique instance of each
time point and then sorted from smallest to largest. There are 44 unique
time points in the dataset where 0
after the decimal
indicates the morning measurement and 5
after the decimal
indicates the evening measurement recorded for that day.
<- sort(unique(smoker$timeseq))
t.seq t.seq
## [1] 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
## [16] 7.5 8.0 8.5 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5
## [31] 16.0 16.5 18.0 18.5 20.0 20.5 22.0 22.5 24.0 24.5 26.0 26.5 28.0 28.5
We are now ready to perform the time varying mediation analysis.
tvmb
functionAs discussed earlier, the tvmb
function has four
required and five optional arguments.
treatment
A binary vector indicating the treatment
groupt.seq
A numeric vector of the time sequence of the
measuresmediator
The matrix of mediator values in wide
formatoutcome
The matrix of outcome values in wide
formatThe optional arguments are:
span
Numeric value of the span to be used for LOESS
regression. Default = 0.75.plot
TRUE or FALSE for plotting the mediation effect.
The default value is “FALSE”.CI
“none” or “boot” method of deriving CIs. The default
value is “boot”.replicates
Number of replicates for bootstrapping CIs.
The default value is 1000.verbose
TRUE or FALSE for printing results to screen.
The default value is “FALSE”.We will call the function with additional optional arguments
plot=TRUE
and replicates = 250
. We decreased
the number of bootstrap replicates so that this vignette compiles faster
but we suggest using at least 500 replicates in an actual analysis. The
remaining optional arguments are left to their respective default
values.
<- tvmb(treatment, t.seq, mediator, outcome,
results_tvmb plot = TRUE, replicates = 250)
The tvmb
function returns a list of results
including:
alpha_hat
estimated time-varying treatment effect on
mediatorCI.lower.a
CI lower limit for coefficient
alpha_hat
CI.upper.a
CI upper limit for coefficient
alpha_hat
gamma_hat
estimated time-varying direct effect of the
treatment on the outcomeCI.lower.g
CI lower limit for coefficient
gamma_hat
CI.upper.g
CI upper limit for coefficient
gamma_hat
beta_hat
estimated time-varying effect of the mediator
on the outcomeCI.lower.b
CI lower limit for coefficient
beta_hat
CI.upper.b
CI upper limit for coefficient
beta_hat
tau_hat
estimated time-varying total effect of the
treatment the on outcomeCI.lower.t
CI lower limit for coefficient
tau_hat
CI.upper.t
CI upper limit for coefficient
tau_hat
medEffect
time varying mediation effect of treatment on
the outcomeOptional returns based on argument CI = "boot"
include:
CI.low
CI lower limit of the time varying mediation
effect medEffect
CI.upper
CI upper limit of the time varying mediation
effect medEffect
The above estimates are compiled in a single dataframe which can be
accessed using nameOfStoredResultsObjb$Estimates
. The
following line of code displays only the estimates at the first 6
time-points.
head(results_tvmb$Estimates)
## timeseq alpha_hat CI.lower.a CI.upper.a gamma_hat CI.lower.g CI.upper.g
## 1 0.0 0.008718590 -0.06531668 0.08059226 NA NA NA
## 2 0.5 0.008250417 -0.06451624 0.07845270 0.04866106 -0.01938197 0.1223228
## 3 1.0 0.007877553 -0.06364690 0.07626370 0.04836989 -0.01910193 0.1208442
## 4 1.5 0.007516115 -0.06277413 0.07407281 0.04827494 -0.01872713 0.1193267
## 5 2.0 0.007082222 -0.06196342 0.07192760 0.04818366 -0.01834896 0.1178082
## 6 2.5 0.006491990 -0.06128023 0.06987563 0.04790350 -0.01805879 0.1163265
## beta_hat CI.lower.b CI.upper.b tau_hat CI.lower.t CI.upper.t
## 1 NA NA NA NA NA NA
## 2 0.004917916 -0.02485494 0.02914817 0.04598454 -0.02238590 0.1161415
## 3 0.004618872 -0.02462724 0.02832955 0.04460882 -0.02294446 0.1136707
## 4 0.004286458 -0.02436371 0.02748596 0.04337601 -0.02343261 0.1111367
## 5 0.003952272 -0.02409867 0.02664141 0.04214605 -0.02391859 0.1085999
## 6 0.003647913 -0.02386643 0.02581995 0.04077888 -0.02447067 0.1061206
## medEffect CI.lower CI.upper
## 1 NA NA NA
## 2 0.0014449752 -0.003230485 0.006416670
## 3 0.0012969309 -0.003367194 0.006177579
## 4 0.0011557029 -0.003501394 0.005929395
## 5 0.0010149943 -0.003635608 0.005680825
## 6 0.0008685085 -0.003772360 0.005440575
At each time point of interest timeseq = t.seq
, the
effects of the treatment on the mediator, the treatment on the outcome
(adjusted and not adjusted for mediator), and the mediator on the
outcome are estimated along with the respective 95% CIs. The CIs are
computed via a non-parametric bootstrap method (Efron and Tibshirani,
1986), drawing samples of size 1086 from the original sample with
replacement, estimating the sample mean and then applying the percentile
method to compute the 95% CIs. Note that the CIs for the
alpha
, gamma
, beta
and
tau
coefficients
(alpha_hat, gamma_hat, beta_hat, tau_hat)
are computed
regardless of the value of CI
argument in the function. In
the above results, medEffect
is the estimated mediation
effect of varenicline
compared to
nicotine patch only
, that varies over timeseq
.
For CI = "boot"
(which is the default option unless the
user chooses otherwise) the 95% CI (CI.low
,
CI.upper
) of medEffect
is estimated via a
similar bootstrapping and percentile technique described earlier for the
coefficients.
If plot = TRUE
argument is passed, the results will also
include the following figures:
plot1_a
plot for alpha_hat
with 95% CIs
across timeseq
plot2_g
plot for gamma_hat
with 95% CIs
across timeseq
plot3_b
plot for beta_hat
with 95% CIs
across timeseq
plot4_t
plot for tau_hat
with 95% CIs
across timeseq
MedEff
plot for medEffect
across
timeseq
MedEff_CI
plot for medEffect
with 95% CIs
across timeseq
bootstrap
plot for estimated medEffect
(s)
from bootstrapped samples across timeseq
We recommend using the plots to interpret your findings as it may be
difficult to derive meaning from the numerical values alone. To display
the plots, use nameOfStoredResultsObj$
followed by the name
of the plot to access the required plot accordingly. For example:
$plot1_a results_tvmb
In the above plot, the time-varying effect of
varenicline
on subjects’
negative mood in the last fifteen minutes
is not
statistically significant. The estimated 95% CI covers 0 (no
effect).
$plot2_g results_tvmb
The above plot shows the time-varying direct effect of
varenicline
on subjects’ daily smoking status
which is not statistically significant. The estimated 95% CI covers 0
(with the possible exception of days 5 and 6).
$plot3_b results_tvmb
In the above plot, the time-varying effect of subjects’
negative mood in the last fifteen minutes
on subjects’
daily smoking status
is not statistically significant as
indicated by the estimated 95% CI covering 0.
$plot4_t results_tvmb
The above plot shows the time-varying total effect of
varenicline
on subjects’ daily smoking status
which is not statistically significant. The estimated 95% CI covers 0
(no effect).
$MedEff results_tvmb
In the above plot, the time-varying effect of
varenicline
on daily smoking status
that is
mediated by the negative mood in the last fifteen minutes
(compared to nicotine patch only
) decreases in the first 2
weeks hovering mostly around 0 (no effect), then decreases sharply
around day 14 before increasing again until day 21 and decreasing
thereafter. However, the CIs include zero thus, we can conclude that
this effect is not statistically significant (see the below plot
MedEff_CI
for the CIs).
$MedEff_CI results_tvmb
$bootstrap results_tvmb
The above plot shows the estimated mediation effect from bootstrapping the original sample 250 times with 1086 sample size.
The tvmb
function computes bootstrap CIs by default.
Therefore, the user may decide not to bootstrap CIs for the mediation
effect by specifying CI = "none"
. However, if by mistake
the user specifies CI = "none"
and
replicates = 500
simultaneously, the function will not
display an error, but will simply execute without computing the CIs for
mediation effect. Note that the CIs for the effects of the treatment on
the mediator and the mediator on the outcome, and for the direct and
total effects are computed even if the user passes the argument
CI = "none"
.
The tvmediation
package provides a set of functions for
estimating mediation effects that vary over time for both binary and
continuous time-varying outcomes. Currently, the package only allows for
a time-invariant treatment. The mediator and outcome are assumed to be
time-varying, such as intensive longitudinal measurements obtained via
Ecological Momentary Assessment or via wearable and mobile devices. The
development of this tool has widespread application for use in human
behavior research, clinical trials, addiction research, and others by
allowing specification of mediation effects that vary as a function of
time.
Baker TB, Piper ME, Stein JH, et al. Effects of Nicotine Patch vs Varenicline vs Combination Nicotine Replacement Therapy on Smoking Cessation at 26 Weeks: A Randomized Clinical Trial. JAMA. 2016;315(4):371.
B. Efron, R. Tibshirani. Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statistical Science. 1986;1(1):54-75.