On a Facebook methods group, there was a question about testing hypotheses in networks. In the comments, it was suggested that BGGM could be used to test the hypothesis. And it turns out that BGGM really shines for testing expectations (see for example Rodriguez et al. 2020).
In this vignette, I demonstrate three ways to go about testing the same hypothesis, which is essentially testing for a difference in the sum of partial correlations between groups.
# need the developmental version
if (!requireNamespace("remotes")) {
install.packages("remotes")
}
# install from github
remotes::install_github("donaldRwilliams/BGGM")
library(BGGM)
For demonstrative purposes, I use the bfi
data and test the hypotheses in males and females
# data
Y <- bfi
# males
Y_males <- subset(Y, gender == 1, select = -c(education, gender))[,1:5]
# females
Y_females <- subset(Y, gender == 2, select = -c(education, gender))[,1:5]
The first approach is rather straightforward, with the caveat that the method needs to be implemented by the user. Note that I could certainly implement this in BGGM, assuming there is enough interest. Please make a feature request here.
The hypothesis was that a sum of relations was larger in one group, for example,
\[ \begin{align} \mathcal{H}_0: (\rho^{male}_{A1--A2}\; + \; \rho^{male}_{A1--A3}) = (\rho^{female}_{A1--A2}\; + \; \rho^{female}_{A1--A3}) \\ \mathcal{H}_1: (\rho^{male}_{A1--A2}\; + \; \rho^{male}_{A1--A3}) > (\rho^{female}_{A1--A2}\; + \; \rho^{female}_{A1--A3}) \end{align} \] Note that the hypothesis is related to the sum of relations, which is readily tested in BGGM.
The first step is to estimate the model for each group
# fit female
fit_female <- estimate(Y_females, seed = 2)
# fit males
fit_male <- estimate(Y_males, seed = 1)
For an example, I used the default which is to assume the data is Gaussian. This can be changed with type =
either binary
, ordinal
, or mixed
.
The next step is to extract the posterior samples for each relation
post_male <- posterior_samples(fit_male)[,c("A1--A2", "A1--A3")]
post_female <- posterior_samples(fit_female)[,c("A1--A2", "A1--A3")]
Note that the column names reflect the upper-triangular elements of the partial correlation matrix. Hence, the first name (e.g.,A1
) must be located before the second name (e.g., A2
) in the data matrix. This can be understood in reference to the column numbers: 1--2
is correct whereas 2--1
will result in an error.
The next step is to sum the relations and compute the difference
# sum males
sum_male <- rowSums(post_male)
# sum females
sum_female <- rowSums(post_female)
# difference
diff <- sum_male - sum_female
which can then be plotted
# three column
par(mfrow=c(1,3))
# male sum
hist(sum_male)
# female sum
hist(sum_female)
# difference
hist(diff)
Next compute the posterior probability the sum is larger in males than females
# posterior prob
mean(sum_male > sum_female)
#> 0.737
and then the credible interval for the difference
quantile(diff, probs = c(0.025, 0.975))
#> 2.5% 97.5%
#> -0.06498586 0.12481253
The next approach is based on a posterior predictive check. The hypothesis is essentially the same as above, but for the predictive distribution, that is,
\[ \begin{align} \mathcal{H}_0: (\rho^{male^{yrep}}_{A1--A2}\; + \; \rho^{male^{yrep}}_{A1--A3}) = (\rho^{female^{yrep}}_{A1--A2}\; + \; \rho^{female^{yrep}}_{A1--A3}) \\ \mathcal{H}_1: (\rho^{male^{yrep}}_{A1--A2}\; + \; \rho^{male^{yrep}}_{A1--A3}) > (\rho^{female^{yrep}}_{A1--A2}\; + \; \rho^{female^{yrep}}_{A1--A3}) \end{align} \] where the only difference is \(yrep\). See more details here.
The first step is to define a function to compute the difference in sums
# colnames
cn <- colnames(Y_males)
# function
f <- function(Yg1, Yg2){
# data
Yg1 <- na.omit(Yg1)
Yg2 <- na.omit(Yg2)
# estimate partials
fit1 <- pcor_mat(estimate(Yg1, analytic = TRUE))
fit2 <- pcor_mat(estimate(Yg2, analytic = TRUE))
# names (not needed)
colnames(fit1) <- cn
rownames(fit1) <- cn
colnames(fit2) <- cn
rownames(fit2) <- cn
# take sum
sum1 <- fit1["A1", "A2"] + fit1["A1", "A3"]
sum2 <- fit2["A1", "A2"] + fit2["A1", "A3"]
# difference
sum1 - sum2
}
Note that the function takes two data matrices and then returns a single value. Also, the default in BGGM does not require a custom function (only needs the data from each group).
The next step is to compute the observed difference and then perform the check.
# observed
obs <- f(Y_males, Y_females)
# check
ppc <- ggm_compare_ppc(Y_males, Y_females,
iter = 250,
FUN = f,
custom_obs = obs)
# print
ppc
#> BGGM: Bayesian Gaussian Graphical Models
#> ---
#> Test: Global Predictive Check
#> Posterior Samples: 250
#> Group 1: 896
#> Group 2: 1813
#> Nodes: 5
#> Relations: 10
#> ---
#> Call:
#> ggm_compare_ppc(Y_males, Y_females, iter = 250, FUN = f, custom_obs = obs)
#> ---
#> Custom:
#>
#> contrast custom.obs p.value
#> Yg1 vs Yg2 0.029 0.264
#> ---
Note this requires the user to determine \(\alpha\).
The check can also be plotted
plot(ppc)
where the red is the critical region.
The above approaches cannot provide evidence that the sum is equal. In other words, just because there was not a difference, this does not provide evidence for equality. The Bayes factor methods allow for formally assessing the equality model, that is,
\[ \begin{align} \mathcal{H}_1&: (\rho^{male}_{A1--A2}\; + \; \rho^{male}_{A1--A3}) > (\rho^{female}_{A1--A2}\; + \; \rho^{female}_{A1--A3}) \\ \mathcal{H}_2&: (\rho^{male}_{A1--A2}\; + \; \rho^{male}_{A1--A3}) = (\rho^{female}_{A1--A2}\; + \; \rho^{female}_{A1--A3}) \\ \mathcal{H}_3&: \text{not} \; \mathcal{H}_1 \; \text{or} \; \mathcal{H}_2 \end{align} \]
where \(\mathcal{H}_3\) is the complement and can be understood as neither the first or second hypothesis.
The hypothesis is easily translated to R
code
hyp <- c("g1_A1--A2 + g1_A1--A3 > g2_A1--A2 + g2_A1--A3;
g1_A1--A2 + g1_A1--A3 = g2_A1--A2 + g2_A1--A3")
Note the g1
indicates the group and ;
separates the hypotheses. I again assume the data is Gaussian (although this can be changed to type = "ordinal"
or type = "mixed"
; see here)
test <- ggm_compare_confirm(Y_males, Y_females,
hypothesis = hyp)
# print
test
#> BGGM: Bayesian Gaussian Graphical Models
#> Type: continuous
#> ---
#> Posterior Samples: 25000
#> Group 1: 896
#> Group 2: 1813
#> Variables (p): 5
#> Relations: 10
#> Delta: 15
#> ---
#> Call:
#> ggm_compare_confirm(Y_males, Y_females, hypothesis = hyp)
#> ---
#> Hypotheses:
#>
#> H1: g1_A1--A2+g1_A1--A3>g2_A1--A2+g2_A1--A3
#> H2: g1_A1--A2+g1_A1--A3=g2_A1--A2+g2_A1--A3
#> H3: complement
#> ---
#> Posterior prob:
#>
#> p(H1|data) = 0.13
#> p(H2|data) = 0.825
#> p(H3|data) = 0.046
#> ---
#> Bayes factor matrix:
#> H1 H2 H3
#> H1 1.000 0.158 2.853
#> H2 6.349 1.000 18.113
#> H3 0.351 0.055 1.000
#> ---
#> note: equal hypothesis prior probabilities
Note the posterior hypothesis probability for the equality model is 0.825. The Bayes factor matrix then divides those values, for example, \(BF_{21}\) indicates the data were about 6 times more likely under \(\mathcal{H}_2\) than \(\mathcal{H}_1\).
The hypothesis can be plotted
plot(test)
It is also important to check the robustness. Here the width of the prior distribution is decreased
test <- ggm_compare_confirm(Y_males, Y_females,
hypothesis = hyp,
prior_sd = 0.15)
# print
test$out_hyp_prob
#> 0.18523406 0.74906147 0.06570447
which results in a probability of 0.75 for \(\mathcal{H}_2\) (\(BF_{21} = 4.04\)).
Three approaches for testing the same hypothesis were demonstrated in this vignette. This highlights that any hypothesis can be tested in BGGM and in several ways.