learn_DAG()
functionlibrary(BCDAG)
This is the second of a series of three vignettes for the R package BCDAG
. In this vignette we focus on function learn_DAG()
, which implements a Markov Chain Monte Carlo (MCMC) algorithm to sample from the joint posterior of DAG structures and DAG-parameters under the Gaussian assumption.
The underlying Bayesian Gaussian DAG-model can be summarized as follows: \[\begin{eqnarray} X_1, \dots, X_q \,|\,\boldsymbol L, \boldsymbol D, \mathcal{D} &\sim& \mathcal{N}_q\left(\boldsymbol 0, (\boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top)^{-1}\right)\\ (\boldsymbol L, \boldsymbol D)\,|\,\mathcal{D} &\sim& \text{DAG-Wishart}(\boldsymbol{a}_{c}^{\mathcal{D}}, \boldsymbol U) \\ p(\mathcal{D}) &\propto& w^{|\mathcal{S}_\mathcal{D}|}(1-w)^{\frac{q(q-1)}{2} - {|\mathcal{S}_\mathcal{D}|}} \end{eqnarray}\]
In particular \(\mathcal{D}=(V,E)\) denotes a DAG structure with set of nodes \(V=\{1,\dots,q\}\) and set of edges \(E\subseteq V \times V\). Moreover, \((\boldsymbol L, \boldsymbol D)\) are model parameters providing the decomposition of the precision (inverse-covariance) matrix \(\boldsymbol{\Omega} = \boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top\); specifically, \(\boldsymbol{L}\) is a \((q, q)\) matrix of coefficients such that for each \((u, v)\)-element \(\boldsymbol{L}_{uv}\) with \(u \ne v\), \(\boldsymbol{L}_{uv} \ne 0\) if and only if \((u, v) \in E\), while \(\boldsymbol{L}_{uu} = 1\) for each \(u = 1,\dots, q\); also, \(\boldsymbol{D}\) is a \((q, q)\) diagonal matrix with \((u, u)\)-element \(\boldsymbol{D}_{uu}\).
The latter decomposition follows from the equivalent Structural Equation Model (SEM) representation of a Gaussian DAG-model; see also Castelletti & Mascaro (2021).
Conditionally to \(\mathcal{D}\), a prior to \((\boldsymbol{L}, \boldsymbol{D})\) is assigned through a compatible DAG-Wishart distribution with rate hyperparameter \(\boldsymbol{U}\), a \((q,q)\) s.p.d. matrix, and shape hyperparameter \(\boldsymbol{a}^{\mathcal {D}}_{c}\), a \((q,1)\) vector; see also Cao et al. (2019) and Peluso & Consonni (2020).
Finally, a prior on DAG \(\mathcal{D}\) is assigned through a Binomial distribution on the number of edges in the graph; in \(p(\mathcal{D})\), \(w \in (0,1)\) is a prior probability of edge inclusion, while \(|\mathcal{S_{\mathcal{D}}}|\) denotes the number of edges in \(\mathcal{D}\); see again Castelletti & Mascaro (2021) for further details.
Target of the MCMC scheme is therefore the joint posterior of \((\boldsymbol{L},\boldsymbol{D},\mathcal{D})\), \[\begin{equation} p(\boldsymbol L, \boldsymbol D, \mathcal{D}\,|\, \boldsymbol X) \propto p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})p(\boldsymbol{L},\boldsymbol{D}\,|\,\mathcal{D}) \,p(\mathcal{D}), \end{equation}\] where \(\boldsymbol{X}\) denotes a \((n,q)\) data matrix as obtained through \(n\) i.i.d. draws from the Gaussian DAG-model and \(p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})\) is the likelihood function. See also Castelletti & Mascaro (2022+) for full details.
We first randomly generate a DAG \(\mathcal{D}\), the DAG parameters \((\boldsymbol{L},\boldsymbol{D})\) and then \(n=1000\) i.i.d. observations from a Gaussian DAG-model as follows:
set.seed(1)
<- 8
q <- 0.2
w <- rDAG(q,w)
DAG <- q
a <- diag(1,q)
U <- rDAGWishart(n=1, DAG, a, U)
outDL <- outDL$L; D <- outDL$D
L <- L %*% solve(D) %*% t(L)
Omega <- solve(Omega)
Sigma <- 1000
n <- mvtnorm::rmvnorm(n = n, sigma = Sigma) X
See also our vignette about data generation from Gaussian DAG-models.
learn_DAG()
Function learn_DAG()
implements an MCMC algorithm to sample from the joint posterior of DAGs and DAG parameters. This is based on a Partial Analytic Structure (PAS) algorithm (Godsill, 2012) which, at each iteration:
fast = TRUE
);We implement it as follows:
<- learn_DAG(S = 5000, burn = 1000, data = X,
out
a, U, w, fast = FALSE, save.memory = FALSE, collapse = FALSE)
Inputs of learn_DAG()
correspond to three different sets of arguments:
S
, burn
and data
are standard inputs required by any MCMC algorithm. In particular, S
defines the desired length of the chain, which is obtained by discarding the first burn
observations (the total number of sampled observations is therefore S + burn
); data
is instead the \((n,q)\) matrix \(\boldsymbol{X}\);a
, U
and w
are hyperparameters of the priors on DAGs (w
) and DAG parameters (a
, U
); see also Equation [REF]. The same appear in functions rDAG()
and rDAGWishart()
which were introduced in our vignette [ADD REF TO THE VIGNETTE].fast
, save.memory
and collapse
are boolean arguments which allow to: implement an approximate proposal distribution within the MCMC scheme (fast = TRUE
); change the array structure of the stored MCMC output into strings in order to save memory (save.memory = TRUE
); implement an MCMC for DAG structure learning only, without sampling from the posterior of parameters (collapse = TRUE
). See also A note on fast = TRUE
and Castelletti & Mascaro (2022+) for full details.The output of learn_DAG()
is an object of class bcdag
:
class(out)
#> [1] "bcdag"
bcdag
objects include the output of the MCMC algorithm together with a collection of meta-data representing the input arguments of learn_DAG()
; these are stored in the attributes of the object::
str(out)
#> List of 3
#> $ Graphs: num [1:8, 1:8, 1:5000] 0 0 0 0 1 0 0 1 0 0 ...
#> $ L : num [1:8, 1:8, 1:5000] 1 0 0 0 -0.0194 ...
#> $ D : num [1:8, 1:8, 1:5000] 0.164 0 0 0 0 ...
#> - attr(*, "class")= chr "bcdag"
#> - attr(*, "type")= chr "complete"
#> - attr(*, "input")=List of 10
#> ..$ S : num 5000
#> ..$ burn : num 1000
#> ..$ data : num [1:1000, 1:8] -0.299 -0.351 -0.631 0.219 -0.351 ...
#> ..$ a : num 8
#> ..$ U : num [1:8, 1:8] 1 0 0 0 0 0 0 0 0 1 ...
#> ..$ w : num 0.2
#> ..$ fast : logi FALSE
#> ..$ save.memory: logi FALSE
#> ..$ collapse : logi FALSE
#> ..$ verbose : logi TRUE
Attribute type
refers to the output of learn_DAG()
, whose structure depends on the choice of boolean arguments save.memory
and collapse
.
When both are set equal to FALSE
, as in the previous example, the output of learn_DAG()
is a complete bcdag
object, collecting three \((q,q,S)\) arrays with the DAG structures (in the form of \(q \times q\) adjacency matrices) and the DAG parameters \(\boldsymbol{L}\) and \(\boldsymbol{D}\) (both as \(q \times q\) matrices) sampled by the MCMC:
$Graphs[,,1]
out#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 1 0 0
#> [3,] 0 0 0 0 0 0 0 1
#> [4,] 0 0 0 0 0 0 0 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 1 0 1 0 0 0 0
#> [8,] 1 0 0 0 0 0 0 0
round(out$L[,,1],2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1.00 0.00 0 0.00 0 0.00 0 0.00
#> [2,] 0.00 1.00 0 0.00 0 0.07 0 0.00
#> [3,] 0.00 0.00 1 0.00 0 0.00 0 0.94
#> [4,] 0.00 0.00 0 1.00 0 0.00 0 0.00
#> [5,] -0.02 0.00 0 0.00 1 0.00 0 0.00
#> [6,] 0.00 0.00 0 0.00 0 1.00 0 0.00
#> [7,] 0.00 -0.02 0 -1.97 0 0.00 1 0.00
#> [8,] 0.40 0.00 0 0.00 0 0.00 0 1.00
round(out$D[,,1],2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0.16 0.00 0.00 0.00 0.00 0.0 0.00 0.00
#> [2,] 0.00 0.57 0.00 0.00 0.00 0.0 0.00 0.00
#> [3,] 0.00 0.00 0.74 0.00 0.00 0.0 0.00 0.00
#> [4,] 0.00 0.00 0.00 0.94 0.00 0.0 0.00 0.00
#> [5,] 0.00 0.00 0.00 0.00 77.94 0.0 0.00 0.00
#> [6,] 0.00 0.00 0.00 0.00 0.00 0.8 0.00 0.00
#> [7,] 0.00 0.00 0.00 0.00 0.00 0.0 0.31 0.00
#> [8,] 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.96
When collapse = TRUE
and save.memory = FALSE
the output of learn_DAG()
is a collapsed bcdag
object, consisting of a \((q,q,S)\) array with the adjacency matrices of the DAGs sampled by the MCMC:
<- learn_DAG(S = 5000, burn = 1000, data = X,
collapsed_out
a, U, w, fast = FALSE, save.memory = FALSE, collapse = TRUE)
names(collapsed_out)
#> [1] "Graphs"
class(collapsed_out)
#> [1] "bcdag"
attributes(collapsed_out)$type
#> [1] "collapsed"
$Graphs[,,1]
collapsed_out#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 0 0 0 1 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 0 0 0 0 0 0 1
#> [8,] 1 0 1 0 0 0 0 0
When save.memory = TRUE
and collapse = FALSE
, the output is a compressed bcdag
object, collecting samples from the joint posterior on DAGs and DAG parameters in the form of a vector of strings:
<- learn_DAG(S = 5000, burn = 1000, data = X,
compressed_out
a, U, w, fast = FALSE, save.memory = TRUE, collapse = FALSE)
names(compressed_out)
#> [1] "Graphs" "L" "D"
class(compressed_out)
#> [1] "bcdag"
attributes(compressed_out)$type
#> [1] "compressed"
In such a case, we can access to the MCMC draws as:
$Graphs[1]
compressed_out#> [1] "0;0;0;0;1;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0"
$L[1]
compressed_out#> [1] "1;0;0;0;-0.0207148822282879;0;0;0.389080449183578;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0.464492744238912;0;0;0;1;0;0;-1.76444008583527;-0.020197409100651;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1"
$D[1]
compressed_out#> [1] "0.154822760388322;0;0;0;0;0;0;0;0;0.544421200660809;0;0;0;0;0;0;0;0;0.395261505201729;0;0;0;0;0;0;0;0;1.04755025995276;0;0;0;0;0;0;0;0;67.9926231564594;0;0;0;0;0;0;0;0;0.831649133274509;0;0;0;0;0;0;0;0;0.303043124888535;0;0;0;0;0;0;0;0;1.76018923453861"
In addition, we implement bd_decode
, an internal function that can be used to visualize the previous objects as matrices:
bd_decode(compressed_out$Graphs[1])
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 0 0 0 0 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 0 0 1 0 0 0 0
#> [8,] 1 0 1 1 0 0 0 0
round(bd_decode(compressed_out$L[1]),2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1.00 0 0.00 0.00 0 0 0 0
#> [2,] 0.00 1 0.00 0.00 0 0 0 0
#> [3,] 0.00 0 1.00 0.00 0 0 0 0
#> [4,] 0.00 0 0.00 1.00 0 0 0 0
#> [5,] -0.02 0 0.00 0.00 1 0 0 0
#> [6,] 0.00 0 0.00 0.00 0 1 0 0
#> [7,] 0.00 0 0.00 -1.76 0 0 1 0
#> [8,] 0.39 0 0.46 -0.02 0 0 0 1
round(bd_decode(compressed_out$D[1]),2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0.15 0.00 0.0 0.00 0.00 0.00 0.0 0.00
#> [2,] 0.00 0.54 0.0 0.00 0.00 0.00 0.0 0.00
#> [3,] 0.00 0.00 0.4 0.00 0.00 0.00 0.0 0.00
#> [4,] 0.00 0.00 0.0 1.05 0.00 0.00 0.0 0.00
#> [5,] 0.00 0.00 0.0 0.00 67.99 0.00 0.0 0.00
#> [6,] 0.00 0.00 0.0 0.00 0.00 0.83 0.0 0.00
#> [7,] 0.00 0.00 0.0 0.00 0.00 0.00 0.3 0.00
#> [8,] 0.00 0.00 0.0 0.00 0.00 0.00 0.0 1.76
Finally, if save.memory = TRUE
and collapse = TRUE
, the output of learn_DAG()
is a compressed and collapsed bcdag
object collecting only the sampled DAGs represented as vector of strings:
<- learn_DAG(S = 5000, burn = 1000, data = X,
comprcoll_out
a, U, w, fast = FALSE, save.memory = TRUE, collapse = TRUE)
names(comprcoll_out)
#> [1] "Graphs"
class(comprcoll_out)
#> [1] "bcdag"
attributes(comprcoll_out)$type
#> [1] "compressed and collapsed"
bd_decode(comprcoll_out$Graphs[1])
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 1 0 0
#> [3,] 0 0 0 0 0 0 0 0
#> [4,] 1 0 0 0 0 0 0 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 0 0 1 0 0 0 0
#> [8,] 1 0 1 0 0 0 0 0
fast = TRUE
Step 1. of the MCMC scheme implemented by learn_DAG()
updates DAG \(\mathcal{D}\) by randomly drawing a new candidate DAG \(\mathcal{D}'\) from a proposal distribution and then accepting it with probability given by the Metropolis Hastings (MH) acceptance rate; see also Castelletti & Mascaro (2021). For a given DAG \(\mathcal{D}\), the proposal distribution \(q(\mathcal{D}'\,|\,\mathcal{D})\) is built over the set \(\mathcal{O}_{\mathcal{D}}\) of direct successors DAGs that can be reached from \(\mathcal{D}\) by inserting, deleting or reversing a single edge in \(\mathcal{D}\). A DAG \(\mathcal{D}'\) is then proposed uniformly from the set \(\mathcal{O}_{\mathcal{D}}\) so that \(q(\mathcal{D}'\,|\,\mathcal{D})=1/|\mathcal{O}_{\mathcal{D}}|\). Moreover, the MH rate requires to evaluate the ratio of proposals \(q(\mathcal{D}'\,|\,\mathcal{D})/q(\mathcal{D}\,|\,\mathcal{D}') = |\mathcal{O}_{\mathcal{D}'}|/|\mathcal{O}_{\mathcal{D}}|\), and accordingly the construction of both \(\mathcal{O}_{\mathcal{D}}\) and \(\mathcal{O}_{\mathcal{D}'}\).
If fast = FALSE
, the proposal ratio is computed exactly; this requires the enumerations of \(\mathcal{O}_\mathcal{D}\) and \(\mathcal{O}_{\mathcal{D}'}\) which may become computationally expensive, especially when \(q\) is large. However, the ratio approaches \(1\) as the number of variables \(q\) increases: option fast = TRUE
implements such an approximation, which therefore avoids the construction of \(\mathcal{O}_\mathcal{D}\) and \(\mathcal{O}_{\mathcal{D}'}\). A comparison between fast = FALSE
and fast = TRUE
in the execution of learn_DAG()
produces the following results in terms of computational time:
# No approximation
<- system.time(out_nofast <- learn_DAG(S = 5000, burn = 1000, data = X,
time_nofast
a, U, w, fast = FALSE, save.memory = FALSE, collapse = FALSE))
# Approximation
<- system.time(out_fast <- learn_DAG(S = 5000, burn = 1000, data = X,
time_fast
a, U, w, fast = TRUE, save.memory = FALSE, collapse = FALSE))
time_nofast#> utente sistema trascorso
#> 16.44 0.00 16.44
time_fast#> utente sistema trascorso
#> 7.34 0.01 7.36
Finally, the corresponding estimated posterior probabilities of edge inclusion are the following:
round(get_edgeprobs(out_nofast), 2)
#> 1 2 3 4 5 6 7 8
#> 1 0.00 0.06 0.07 0.00 0 0.00 0.00 0.00
#> 2 0.05 0.00 0.02 0.03 0 0.21 0.04 0.00
#> 3 0.09 0.02 0.00 0.00 0 0.01 0.02 0.52
#> 4 0.00 0.06 0.00 0.00 0 0.01 0.52 0.00
#> 5 1.00 0.00 0.00 0.00 0 0.00 0.01 0.00
#> 6 0.03 0.26 0.00 0.00 0 0.00 0.02 0.00
#> 7 0.03 0.01 0.00 0.48 0 0.03 0.00 0.02
#> 8 1.00 0.00 0.48 0.01 0 0.00 0.03 0.00
round(get_edgeprobs(out_fast), 2)
#> 1 2 3 4 5 6 7 8
#> 1 0.00 0.04 0.01 0.02 0.00 0.04 0.01 0.00
#> 2 0.05 0.00 0.00 0.03 0.00 0.20 0.01 0.00
#> 3 0.05 0.00 0.00 0.00 0.01 0.01 0.06 0.55
#> 4 0.02 0.03 0.00 0.00 0.00 0.02 0.55 0.01
#> 5 1.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00
#> 6 0.06 0.24 0.01 0.00 0.01 0.00 0.03 0.01
#> 7 0.00 0.02 0.03 0.45 0.00 0.01 0.00 0.00
#> 8 1.00 0.00 0.45 0.02 0.00 0.04 0.00 0.00
Ben-David E, Li T, Massam H, Rajaratnam B (2015). “High dimensional Bayesian inference for Gaussian directed acyclic graph models.” arXiv pre-print.
Cao X, Khare K, Ghosh M (2019). “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.” The Annals of Statistics, 47(1), 319–348.
Castelletti F, Consonni G (2021). “Bayesian causal inference in probit graphical models” Bayesian Analysis, In press.
Castelletti F, Mascaro A (2021). “Structural learning and estimation of joint causal effects among network-dependent variables.” Statistical Methods & Applications, 30, 1289–1314.
Castelletti F, Mascaro A (2022). “BCDAG: An R package for Bayesian structural and Causal learning of Gaussian DAGs.” arXiv pre-print.
Godsill, SJ (2012). “On the relationship between Markov chain Monte Carlo methods for model uncertainty.” Journal of computational and graphical statistics, 10(2), 230-248.
Peluso S, Consonni G (2020). “Compatible priors for model selection of high-dimensional Gaussian DAGs.” Electronic Journal of Statistics, 14(2), 4110–4132.