tidySEM
offers a user-friendly, tidy workflow for generating syntax for SEM models. The workflow is top-down, meaning that syntax is generated based on conceptual model elements. In many cases, the generated syntax will suffice - but it is always customizable. The workflow also tries to intelligently guess which variables go together, but these defaults can be overridden.
The workflow underlying syntax generation in tidySEM
is as follows:
data
object short, informative names, that are easily machine readabletidy_sem
object by running model <- tidy_sem(data)
measurement(model)
dictionary
, data
, and syntax
elements in the tidy_sem
object by calling dictionary(model)
, get_data(model)
, or syntax(model)
dictionary
, data
, and syntax
elements in the tidy_sem
object dictionary(model) <- ...
, get_data(model) <- ...
, and syntax(model) <- ...
tidy_sem
object to lavaan
syntax using as_lavaan(model)
and using that as input for the lavaan
functions sem
, lavaan
, or cfa
tidy_sem
object to OpenMx
using as_ram(model)
, and using that as input for mxRun
or `run_mx``tidy_sem
object to Mplus
using as_mplus(model)
, and using that as input for MplusAutomation::mplusObject()
estimate_lavaan(model)
, estimate_mx(model)
, or estimate_mplus(model)
All elements of the tidy_sem
object are “tidy” data, i.e., tabular data.frames
, and can be modified using the familiar suite of functions in the ‘tidyverse’. Thus, the data, dictionary, and syntax are all represented as data.frame
s.
As an example, let’s make a graph for a classic lavaan
tutorial example for CFA. First, we check the data names:
<- HolzingerSwineford1939
df names(df)
#> [1] "id" "sex" "ageyr" "agemo" "school" "grade" "x1" "x2"
#> [9] "x3" "x4" "x5" "x6" "x7" "x8" "x9"
These names are not informative, as the items named x..
are indicators of three different latent variables. We will rename them accordingly:
names(df)[grepl("^x", names(df))] <- c("vis_1", "vis_2", "vis_3", "tex_1", "tex_2", "tex_3", "spe_1", "spe_2", "spe_3")
In general, it is good practice to name variables using the following information:
tidySEM
expects to be separated from the remainder of the variable name by a splitting character (e.g., scale_item
)Roughly speaking, elements of the variable name should be ordered from “slow-changing” to “fast-changing”; i.e.; there are only a few scales, with possibly several measurement occasions or respondents, and many items.
A dictionary indicates which variables in the data belong to, for example, the same scale. When the data have informative names, it is possible to construct a data dictionary automatically:
<- tidy_sem(df)
model
model#> A tidy_sem object
#> [0;30m[0;32mv [0m $dictionary[0m
#> [0;30m[0;32mv [0m $data[0m
#> [0;30m[0;37mo [0m $syntax[0m
We can automatically add basic syntax to the sem_syntax
object, by passing it to a syntax-generating function like measurement()
, which adds a measurement model for any scales in the object:
|>
model measurement() -> model
model
#> A tidy_sem object
#> [0;30m[0;32mv [0m $dictionary[0m
#> [0;30m[0;32mv [0m $data[0m
#> [0;30m[0;32mv [0m $syntax[0m
The resulting model can be evaluated as ‘lavaan’ syntax, ‘OpenMx’ syntax, or ‘Mplus’ syntax, using the as_lavaan
, as_ram
, and as_mplus
functions. For example, using lavaan:
|>
model estimate_lavaan()
#> lavaan 0.6-9 ended normally after 35 iterations
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 30
#>
#> Number of observations 301
#>
#> Model Test User Model:
#>
#> Test statistic 85.306
#> Degrees of freedom 24
#> P-value (Chi-square) 0.000
The same model can be estimated in ‘Mplus’ through the R-package MplusAutomation
. This requires ‘Mplus’ to be installed.
|>
model estimate_mx()
The same model can be estimated in ‘Mplus’ through the R-package MplusAutomation
. This requires ‘Mplus’ to be installed.
library(MplusAutomation)
|>
model estimate_mplus()
The dictionary and syntax can be examined using dictionary(model)
and syntax(model)
:
dictionary(model)
#> name scale type label
#> 1 id <NA> observed id
#> 2 sex <NA> observed sex
#> 3 ageyr <NA> observed ageyr
#> 4 agemo <NA> observed agemo
#> 5 school <NA> observed school
#> 6 grade <NA> observed grade
#> 7 vis_1 vis indicator vis_1
#> 8 vis_2 vis indicator vis_2
#> 9 vis_3 vis indicator vis_3
#> 10 tex_1 tex indicator tex_1
#> 11 tex_2 tex indicator tex_2
#> 12 tex_3 tex indicator tex_3
#> 13 spe_1 spe indicator spe_1
#> 14 spe_2 spe indicator spe_2
#> 15 spe_3 spe indicator spe_3
#> 16 vis <NA> latent vis
#> 17 tex <NA> latent tex
#> 18 spe <NA> latent spe
syntax(model)
#> lhs op rhs block group free label ustart plabel
#> 1 vis =~ vis_1 1 1 0 1 .p1.
#> 2 vis =~ vis_2 1 1 1 NA .p2.
#> 3 vis =~ vis_3 1 1 1 NA .p3.
#> 4 tex =~ tex_1 1 1 0 1 .p4.
#> 5 tex =~ tex_2 1 1 1 NA .p5.
#> 6 tex =~ tex_3 1 1 1 NA .p6.
#> 7 spe =~ spe_1 1 1 0 1 .p7.
#> 8 spe =~ spe_2 1 1 1 NA .p8.
#> 9 spe =~ spe_3 1 1 1 NA .p9.
#> 10 vis_1 ~~ vis_1 1 1 1 NA .p10.
#> 11 vis_2 ~~ vis_2 1 1 1 NA .p11.
#> 12 vis_3 ~~ vis_3 1 1 1 NA .p12.
#> 13 tex_1 ~~ tex_1 1 1 1 NA .p13.
#> 14 tex_2 ~~ tex_2 1 1 1 NA .p14.
#> 15 tex_3 ~~ tex_3 1 1 1 NA .p15.
#> 16 spe_1 ~~ spe_1 1 1 1 NA .p16.
#> 17 spe_2 ~~ spe_2 1 1 1 NA .p17.
#> 18 spe_3 ~~ spe_3 1 1 1 NA .p18.
#> 19 vis ~~ vis 1 1 1 NA .p19.
#> 20 tex ~~ tex 1 1 1 NA .p20.
#> 21 spe ~~ spe 1 1 1 NA .p21.
#> 22 vis ~~ tex 1 1 1 NA .p22.
#> 23 vis ~~ spe 1 1 1 NA .p23.
#> 24 tex ~~ spe 1 1 1 NA .p24.
#> 25 vis_1 ~1 1 1 1 NA .p25.
#> 26 vis_2 ~1 1 1 1 NA .p26.
#> 27 vis_3 ~1 1 1 1 NA .p27.
#> 28 tex_1 ~1 1 1 1 NA .p28.
#> 29 tex_2 ~1 1 1 1 NA .p29.
#> 30 tex_3 ~1 1 1 1 NA .p30.
#> 31 spe_1 ~1 1 1 1 NA .p31.
#> 32 spe_2 ~1 1 1 1 NA .p32.
#> 33 spe_3 ~1 1 1 1 NA .p33.
#> 34 vis ~1 1 1 0 0 .p34.
#> 35 tex ~1 1 1 0 0 .p35.
#> 36 spe ~1 1 1 0 0 .p36.
At this stage, we may want to modify the basic syntax slightly. The functions dictionary(model) <- ...
and syntax(model) <- ...
can be used to modify the dictionary and syntax:
dictionary(model) |>
mutate(label = ifelse(label == "vis", "Visual", label))
#> name scale type label
#> 1 id <NA> observed id
#> 2 sex <NA> observed sex
#> 3 ageyr <NA> observed ageyr
#> 4 agemo <NA> observed agemo
#> 5 school <NA> observed school
#> 6 grade <NA> observed grade
#> 7 vis_1 vis indicator vis_1
#> 8 vis_2 vis indicator vis_2
#> 9 vis_3 vis indicator vis_3
#> 10 tex_1 tex indicator tex_1
#> 11 tex_2 tex indicator tex_2
#> 12 tex_3 tex indicator tex_3
#> 13 spe_1 spe indicator spe_1
#> 14 spe_2 spe indicator spe_2
#> 15 spe_3 spe indicator spe_3
#> 16 vis <NA> latent Visual
#> 17 tex <NA> latent tex
#> 18 spe <NA> latent spe
For example, imagine we want to change the model, so that all items of the “spe” subscale load on the “tex” latent variable. We would first replace the latent variable “spe” with “tex”, and secondly remove all mention of the “spe” latent variable:
syntax(model) |>
mutate(lhs = ifelse(lhs == "spe" & op == "=~", "tex", lhs)) |>
filter(!(lhs == "spe" | rhs == "spe")) -> syntax(model)
The modified model could then be run:
estimate_lavaan(model)
#> lavaan 0.6-9 ended normally after 28 iterations
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 27
#>
#> Number of observations 301
#>
#> Model Test User Model:
#>
#> Test statistic 330.994
#> Degrees of freedom 27
#> P-value (Chi-square) 0.000
In addition to the way of editing the data.frame
with model syntax described in Step 6, it is also possible to add (or modify) paths by adding lavaan
syntax. For example, imagine that - instead of having “vis” and “tex” correlate, we want to add a regression path between them:
|>
model add_paths("vis ~ tex") |>
estimate_lavaan() |>
summary(estimates = TRUE)
#> lavaan 0.6-9 ended normally after 27 iterations
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 27
#>
#> Number of observations 301
#>
#> Model Test User Model:
#>
#> Test statistic 330.994
#> Degrees of freedom 27
#> P-value (Chi-square) 0.000
#>
#> Parameter Estimates:
#>
#> Standard errors Standard
#> Information Expected
#> Information saturated (h1) model Structured
#>
#> Latent Variables:
#> Estimate Std.Err z-value P(>|z|)
#> vis =~
#> vis_1 1.000
#> vis_2 0.558 0.104 5.361 0.000
#> vis_3 0.711 0.116 6.143 0.000
#> tex =~
#> tex_1 1.000
#> tex_2 1.448 0.099 14.657 0.000
#> tex_3 1.234 0.084 14.696 0.000
#> spe_1 1.000
#> spe_2 0.319 0.083 3.860 0.000
#> spe_3 0.437 0.082 5.326 0.000
#>
#> Regressions:
#> Estimate Std.Err z-value P(>|z|)
#> vis ~
#> tex 0.591 0.092 6.408 0.000
#>
#> Intercepts:
#> Estimate Std.Err z-value P(>|z|)
#> .vis_1 4.936 0.067 73.473 0.000
#> .vis_2 6.088 0.068 89.855 0.000
#> .vis_3 2.250 0.065 34.579 0.000
#> .tex_1 3.061 0.060 51.280 0.000
#> .tex_2 4.341 0.074 58.452 0.000
#> .tex_3 2.186 0.063 34.667 0.000
#> .spe_1 4.186 0.080 52.359 0.000
#> .spe_2 5.527 0.058 94.854 0.000
#> .spe_3 5.374 0.058 92.546 0.000
#> .vis 0.000
#> tex 0.000
#>
#> Variances:
#> Estimate Std.Err z-value P(>|z|)
#> .vis_1 0.538 0.126 4.280 0.000
#> .vis_2 1.127 0.102 11.006 0.000
#> .vis_3 0.860 0.094 9.151 0.000
#> .tex_1 0.512 0.050 10.281 0.000
#> .tex_2 0.485 0.065 7.447 0.000
#> .tex_3 0.343 0.047 7.335 0.000
#> .spe_1 1.364 0.118 11.587 0.000
#> .spe_2 0.965 0.079 12.160 0.000
#> .spe_3 0.908 0.075 12.055 0.000
#> .vis 0.625 0.134 4.673 0.000
#> tex 0.560 0.073 7.638 0.000
This function accepts both quoted (character) and unquoted arguments. So, for example, if we want to add a cross-loading from “spe_1” on “vis”, in addition to the regression path before, we could use the following code:
|>
model add_paths("vis ~ tex", vis =~ spe_1) |>
estimate_lavaan()
#> Warning in lav_object_post_check(object): lavaan WARNING: some estimated ov
#> variances are negative
#> lavaan 0.6-9 ended normally after 64 iterations
#>
#> Estimator ML
#> Optimization method NLMINB
#> Number of model parameters 27
#>
#> Number of observations 301
#>
#> Model Test User Model:
#>
#> Test statistic 407.730
#> Degrees of freedom 27
#> P-value (Chi-square) 0.000