CausalQueries
is a package that lets you declare binary
causal models, update beliefs about causal types given data and
calculate arbitrary estimands. Model definition makes use of
dagitty
functionality. Updating is implemented in
stan
.
See here for a
guide to using CausalQueries
along with many examples of
causal models
To install the latest stable release of
CausalQueries
:
install.packages("CausalQueries")
To install the latest development release :
install.packages("remotes")
remotes::install_github("macartan/CausalQueries")
Causal models are defined by:
X=1
and for which
Y=1
if and only if X=1
. The set of causal
types grows rapidly with the number of nodes and the number of nodes
pointing into any given node. In this setting imposing functional forms
is the same as placing restrictions on causal types: such
restrictions reduce complexity but require substantive assumptions. An
example of a restriction might be “Y
is monotonic in
X
.”A wrinkle:
Our goal is to form beliefs over parameters but also over more substantive estimands:
With a causal model in hand and data available about some or all
of the nodes, it is possible to make use of a generic stan
model that generates posteriors over the parameter vector.
Given updated (or prior) beliefs about parameters it is possible
to calculate causal estimands of inference from a causal model. For
example “What is the probably that X
was the cause of
Y
given X=1
, Y=1
and
Z=1
.”
Here is an example of a model in which X
causes
M
and M
causes Y
. There is, in
addition, unobservable confounding between X
and
Y
. This is an example of a model in which you might use
information on M
to figure out whether X
caused Y
.
The DAG is defined using dagitty
syntax like this:
model <- make_model("X -> M -> Y")
To add the confounding we have to allow an additional parameter that
allows a possibly different assignment probability for X
given a causal type for Y
.
model <- set_confound(model, list(X = "Y[X=1] == 1"))
We then set priors thus:
model <- set_priors(model, distribution = "jeffreys")
You can plot the dag, making use of functions in the
dagitty
package.
plot(model)
You can draw data from the model, like this:
data <- make_data(model, n = 10)
Updating is done like this:
updated_model <- update_model(model, data)
Finally you can calculate an estimand of interest like this:
CoE <- query_distribution(
model = updated_model,
using = "posteriors",
query = "Y[X=0] == 0",
subset = "X==1 & Y==1"
)
This uses the posterior distribution and the model to assess the
“causes of effects” estimand: the probability that X=1
was
the cause of Y=1
in those cases in which X=1
and Y=1
. The approach is to imagine a set of “do”
operations on the model, that control the level of X
and to
inquire about the level of Y
given these operations, and
then to assess how likely is is that Y
would be 0 if
X
were fixed at 0 within a set that naturally take on
particular values of X
and Y
. By the same
token this posterior can be calculated conditional on observations of
M
, allowing an assessment of how data on mediators alters
inference about the causes of effects.
The approach used in CausalQueries
is a generalization
of the biqq
models described in “Mixing Methods: A Bayesian
Approach” (Humphreys and Jacobs, 2015,
https://doi.org/10.1017/S0003055415000453). The conceptual extension
makes use of work on probabilistic causal models described in Pearl’s
Causality (Pearl, 2009,
https://doi.org/10.1017/CBO9780511803161). The approach to generating a
generic stan
function that can take data from arbitrary
models was developed in key contributions by Jasper Cooper
(http://jasper-cooper.com/) and Georgiy Syunyaev
(http://gsyunyaev.com/). Lily Medina (https://lilymedina.github.io/) did
the magical work of pulling it all together and developing approaches to
characterizing confounding and defining estimands. Julio Solis has done
wonders to simplify the specification of priors.