In this vignette, we will perform a small simulation study to test whether the model fitting engines implemented in the thurstonianIRT package are able to recover known parameter values from a simulated data set. This also extends the automated unit tests implemented in the package to check the correctness of the software. For a more extensive simulation study using thurstonianIRT, see Bürkner, Schulte, and Holling (2019).
First, we will load a bunch of packages required in the vignette.
library(thurstonianIRT)
library(dplyr)
library(tidyr)
Next, we will set up the simulation condition.
<- 500
npersons <- 5
ntraits <- 3
nitems_per_block <- 9
nblocks_per_trait <- ntraits * nblocks_per_trait / nitems_per_block
nblocks <- ntraits * nblocks_per_trait
nitems <- (nitems_per_block * (nitems_per_block - 1)) / 2 * nblocks ncomparisons
Now, we will choose a set of true parameter values.
set.seed(123)
<- runif(nitems, 0.65, 0.96)
lambda <- c(rep(1, ceiling(nitems / 2)), rep(-1, floor(nitems / 2)))
signs <- lambda * signs[sample(seq_len(nitems))]
lambda <- runif(nitems, -1, 1)
gamma <- diag(5) Phi
Finally, we put all of this together and simulate a data set based on the condion and true parameter values.
<- sim_TIRT_data(
sdata npersons = npersons,
ntraits = ntraits,
nitems_per_block = nitems_per_block,
nblocks_per_trait = nblocks_per_trait,
gamma = gamma,
lambda = lambda,
Phi = Phi
)
We know that the chosen simulation condition and parameter values are well behaved (this may not be the case in all situations; see Bürkner, Schulte, & Holling, 2019). Accordingly, the model fitting engines should show good parameter recovery given that they are correctly implemented. We will now go ahead and fit the model with all three engines.
<- fit_TIRT_stan(sdata, chains = 1, iter = 1000, warmup = 500)
fit_stan <- fit_TIRT_lavaan(sdata)
fit_lavaan <- fit_TIRT_mplus(sdata) fit_mplus
We can now predict the trait scores of all persons and compare them to the true scores, which are stored in the simulated data set.
<- as_tibble(as.data.frame(attributes(sdata)$eta))
eta names(eta) <- paste0("trait", 1:ncol(eta))
<- eta %>%
true_scores mutate(id = 1:n()) %>%
gather(key = "trait", value = "truth", -id)
<- true_scores %>%
true_summaries group_by(trait) %>%
summarise(true_mean = mean(truth), true_sd = sd(truth))
<- predict(fit_stan) %>%
pred bind_rows(predict(fit_lavaan), predict(fit_mplus), .id = "source") %>%
mutate(
source = as.character(factor(
levels = 1:3, labels = c("stan", "lavaan", "mplus")
source,
)),trait = tolower(trait)
%>%
) inner_join(true_scores, by = c("id", "trait"))
<- pred %>%
pred inner_join(
%>%
pred group_by(trait, source) %>%
summarise(cor_est_truth = cor(estimate, truth)),
by = c("trait", "source")
%>%
) mutate(
sign = sign(cor_est_truth),
estimate = ifelse(sign %in% -1, -estimate, estimate)
%>%
) inner_join(true_summaries, by = "trait") %>%
group_by(trait, source) %>%
mutate(
est_mean = mean(estimate),
est_sd = sd(estimate)
%>%
) ungroup() %>%
mutate(
ztruth = (truth - true_mean) / true_sd,
zestimate = (estimate - est_mean) / est_sd
)
Various measures of model predictive accuracy can be computed, for instance, the reliability. For the present simulation condition, we would expect the reliability to be roughly between 0.85 and 0.9 for all traits. By looking at the results below, we can verify that this is indeed that case.
<- pred %>%
res group_by(trait, source) %>%
summarise(rel = cor(estimate, truth)^2)
res
# A tibble: 15 x 3
# Groups: trait [5]
trait source rel
<chr> <chr> <dbl>
1 trait1 lavaan 0.833
2 trait1 mplus 0.848
3 trait1 stan 0.849
4 trait2 lavaan 0.823
5 trait2 mplus 0.839
6 trait2 stan 0.842
7 trait3 lavaan 0.799
8 trait3 mplus 0.831
9 trait3 stan 0.830
10 trait4 lavaan 0.803
11 trait4 mplus 0.832
12 trait4 stan 0.831
13 trait5 lavaan 0.821
14 trait5 mplus 0.830
15 trait5 stan 0.837
Lastly, we can also compute and investigate the trait correlations between different engines so check whether they really estimate equivalent trait scores.
<- pred %>%
cor_matrix mutate(
# ensure correct ordering of traits
SC = paste0(source, "_", trait),
SC = factor(SC, levels = unique(SC))
%>%
) select(id, SC, estimate) %>%
spread(key = "SC", value = "estimate") %>%
bind_cols(eta, .) %>%
select(-id) %>%
cor()
We would expect the correlations of the estimates of the same trait across engines to be very high, that is, higher than 0.95 at least. We can verify that this is indeed the case as exemplified for trait1
below.
<- paste0(c("stan", "lavaan", "mplus"), "_trait1")
trait1 round(cor_matrix[trait1, trait1], 2)
stan_trait1 lavaan_trait1 mplus_trait1
stan_trait1 1.00 0.99 1.00
lavaan_trait1 0.99 1.00 0.99
mplus_trait1 1.00 0.99 1.00
Taken together, we have seen how to set up and perform a simple simulation study to test the parameter recovery of Thurstonian IRT models and, at the same time, test the correctness of the thurstonianIRT software.
Bürkner P. C., Schulte N., & Holling H. (2019). On the Statistical and Practical Limitations of Thurstonian IRT Models. Educational and Psychological Measurement. 79(5). 827–854.