Penalized Factor Analysis: grid search

Introduction

Aim. This vignette shows how to fit a penalized factor analysis model with the scad and mcp penalties using the routines in the penfa package. The employed penalty is aimed at encouraging sparsity in the factor loading matrix. Since the scad and mcp penalties cannot be used with the automatic tuning procedure (see Geminiani et al., 2021 for details), the optimal tuning parameter will be found through grid searches.

Data. For illustration purposes, we use the cross-cultural data set ccdata containing the standardized ratings to 12 items concerning organizational citizenship behavior. Employees from different countries were asked to rate their attitudes towards helping other employees and giving suggestions for improved work conditions. The items are thought to measure two latent factors: help, defined by the first seven items (h1 to h7), and voice, represented by the last five items (v1 to v5). See ?ccdata for details.

This data set is a standardized version of the one in the ccpsyc package, and only considers employees from Lebanon and Taiwan (i.e., "LEB", "TAIW"). This vignette is meant as a demo of the capabilities of penfa; please refer to Fischer et al. (2019) and Fischer and Karl (2019) for a description and analysis of these data.

Let us load and inspect ccdata.

library(penfa)
data(ccdata)

summary(ccdata)
##    country                h1                 h2                h3                 h4         
##  Length:767         Min.   :-2.62004   Min.   :-2.9034   Min.   :-2.63082   Min.   :-3.0441  
##  Class :character   1st Qu.:-0.69516   1st Qu.:-0.2163   1st Qu.:-0.70356   1st Qu.:-0.2720  
##  Mode  :character   Median :-0.05354   Median : 0.4554   Median :-0.06114   Median : 0.4211  
##                     Mean   : 0.00000   Mean   : 0.0000   Mean   : 0.00000   Mean   : 0.0000  
##                     3rd Qu.: 0.58809   3rd Qu.: 0.4554   3rd Qu.: 0.58128   3rd Qu.: 0.4211  
##                     Max.   : 1.22971   Max.   : 1.1272   Max.   : 1.22370   Max.   : 1.1141  
##        h5                h6                h7                v1                  v2           
##  Min.   :-2.9105   Min.   :-2.9541   Min.   :-2.8364   Min.   :-2.627694   Min.   :-2.674430  
##  1st Qu.:-0.8662   1st Qu.:-0.9092   1st Qu.:-0.7860   1st Qu.:-0.660770   1st Qu.:-0.671219  
##  Median : 0.4966   Median : 0.4541   Median :-0.1025   Median :-0.005129   Median :-0.003482  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.000000   Mean   : 0.000000  
##  3rd Qu.: 0.4966   3rd Qu.: 0.4541   3rd Qu.: 0.5810   3rd Qu.: 0.650512   3rd Qu.: 0.664255  
##  Max.   : 1.1781   Max.   : 1.1358   Max.   : 1.2645   Max.   : 1.306154   Max.   : 1.331992  
##        v3                 v4                 v5          
##  Min.   :-2.65214   Min.   :-2.65722   Min.   :-2.51971  
##  1st Qu.:-0.68800   1st Qu.:-0.68041   1st Qu.:-0.61127  
##  Median :-0.03329   Median :-0.02148   Median : 0.02488  
##  Mean   : 0.00000   Mean   : 0.00000   Mean   : 0.00000  
##  3rd Qu.: 0.62142   3rd Qu.: 0.63746   3rd Qu.: 0.66103  
##  Max.   : 1.27613   Max.   : 1.29639   Max.   : 1.29718

Model specification

Before fitting the model, we need to write a model syntax describing the relationships between the items and the latent factors. To facilitate its formulation, the rules for the syntax specification broadly follow the ones required by lavaan. The syntax must be enclosed in single quotes ' '.

syntax = 'help  =~   h1 + h2 + h3 + h4 + h5 + h6 + h7 + 0*v1 + v2 + v3 + v4 + v5
          voice =~ 0*h1 + h2 + h3 + h4 + h5 + h6 + h7 +   v1 + v2 + v3 + v4 + v5'

The factors help and voice appear on the left-hand side, whereas the observed variables on the left-hand side. Following the rationale in Geminiani et al. (2021), we only specify the minimum number of identification constraints. We are setting the scales of the factors by fixing their factor variances to 1. This can be done in one of two ways: 1) by adding 'help ~~ 1*help' and 'voice ~~ 1*voice' to the syntax above; or 2) by setting the argument std.lv = TRUE in the fitting function (see below). To avoid rotational freedom, we fix one loading per factor to zero. Parameters can be easily fixed to user-defined values through the pre-multiplication mechanism. By default, unique variances are automatically added to the model, and the factors are allowed to correlate. These specifications can be modified by altering the syntax (see ?penfa for details on how to write the model syntax).

Model fitting

The core of the package is given by the penfa function, a short form for PENalized Factor Analysis, that implements the framework discussed in Geminiani et al. (2021). If users decide for either the scad or mcp penalties, they need to use strategy = "fixed" since the automatic procedure is not feasible.

Scad

We start off with the scad. In the function call, we now specify pen.shrink = "scad", and we provide through the eta argument the fixed value of the tuning parameter to be employed during optimization (here, for instance, 0.05). The name given to the starting value (here, the factor loading matrix "lambda") reflects the parameter matrix to be penalized. All of its elements are penalized, which means here that the penalization is applied to all factor loadings (except the ones fixed for identification). The scad penalty relies on an additional shape parameter, which is set by default to 3.7 (Fan and Li 2001). This value can be conveniently modified through the a.scad argument. See ?penfaOptions for additional details on the possible options.

scad.fit <- penfa(## factor model
                  model  = syntax,
                  data   = ccdata,
                  std.lv = TRUE,
                  ## penalization
                  pen.shrink = "scad",
                  eta = list(shrink = c("lambda" = 0.05), diff = c("none" = 0)),
                  ## fixed tuning
                  strategy = "fixed")
## 
## Largest absolute gradient value: 27.97150845
## Fisher information matrix is positive definite
## Eigenvalue range: [164.74, 382028.4]
## Trust region iterations: 24 
## Factor solution: admissible 
## Computing VCOV ... done.
## Effective degrees of freedom: 27.94395

Grid search

In order to find the optimal value of the tuning parameter, a grid search needs to be conducted, and the optimal model is the one with the lowest GBIC (Generalized Bayesian Information Criterion). For demo purposes, we use a grid of 51 values evenly spaced between 0 and 0.15. However, for accurate analyses, please consider finer grids (of at least 200 elements) with an upper bound reasonable for the data at hand.

# Grid of values for tuning parameter
eta.grid <- seq(0, 0.15, length.out = 51)

# Return GBIC from a converged and admissible penfa model with fixed tuning
penfa.fixedTun <- function(eta, penalty, ...){
  
  fitted <- penfa(model = syntax, data = ccdata, 
                  std.lv = TRUE, pen.shrink = penalty, 
                  eta = list(shrink = c("lambda" = eta), diff = c("none" = 0)),
                  strategy = "fixed", verbose = FALSE, ...)
  
  if(all(fitted@Vcov$solution) & fitted@Vcov$admissibility)
    return(BIC(fitted))
}

# additional penfaOptions can be passed
GBIC.scad <- sapply(eta.grid, penfa.fixedTun, penalty = "scad")

The optimal tuning parameter is the one generating the penalized model with the lowest GBIC.

optimtun.scad <- eta.grid[[which.min(GBIC.scad)]]

# To plot GBIC across tuning values
# p <- plotly::plot_ly(x = eta.grid, y = GBIC.scad, type = 'scatter', mode = 'lines')
# plotly::layout(p, xaxis = list(showline = TRUE),
#               title = list(text = "GBIC values across tuning parameters"))

We can ultimately fit the model with the optimal tuning parameter (here, 0.114). The summary method details information on the model characteristics, the optimization and penalization procedures as well as the parameter estimates with associated standard errors and confidence intervals. The Type column distinguishes between the fixed parameters set to specific values for identification, the free parameters that have been estimated through ordinary maximum likelihood, and the penalized (pen) parameters. The standard errors here have been computed as the square root of the inverse of the penalized Fisher information matrix (Geminiani et al., 2021). The last columns report 95% confidence intervals (CI) for the model parameters. Standard errors and CI of the penalized parameters shrunken to zero are not displayed. A different significance level can be specified through the level argument in the summary call.

scad.fit <- penfa(## factor model 
                  model = syntax, 
                  data = ccdata, 
                  std.lv = TRUE, 
                  ## penalization
                  pen.shrink = "scad", 
                  # optimal tuning
                  eta = list(shrink = c("lambda" = optimtun.scad), diff = c("none" = 0)),
                  strategy = "fixed", 
                  verbose = FALSE)
summary(scad.fit)
## penfa 0.1.1 reached convergence
## 
##   Number of observations                                    767
##   Number of groups                                            1
##   Number of observed variables                               12
##   Number of latent factors                                    2
##                                                                
##   Estimator                                                PMLE
##   Optimization method                              trust-region
##   Information                                            fisher
##   Strategy                                                fixed
##   Number of iterations                                       56
##   Number of parameters:                                        
##     Free                                                     13
##     Penalized                                                22
##   Effective degrees of freedom                           26.424
##   GIC                                                 17224.457
##   GBIC                                                17347.131
##                                                                
##   Penalty function:                                            
##     Sparsity                                               scad
##                                                                
##   Additional tuning parameter                                  
##     scad                                                    3.7
##                                                                
##   Optimal tuning parameter:                                    
##     Sparsity                                                   
##      - Factor loadings                                    0.114
##                                                                
## 
## Parameter Estimates:
## 
## Latent Variables:
##                     Type    Estimate  Std.Err     2.5%    97.5%
##   help =~                                                      
##     h1               pen       0.779    0.030    0.720    0.839
##     h2               pen       0.875    0.029    0.818    0.931
##     h3               pen       0.788    0.030    0.729    0.847
##     h4               pen       0.915    0.031    0.854    0.976
##     h5               pen       0.831    0.034    0.765    0.897
##     h6               pen       0.812    0.036    0.740    0.883
##     h7               pen       0.507    0.050    0.409    0.604
##     v1             fixed       0.000             0.000    0.000
##     v2               pen       0.000                           
##     v3               pen       0.000                           
##     v4               pen       0.000                           
##     v5               pen      -0.000                           
##   voice =~                                                     
##     h1             fixed       0.000             0.000    0.000
##     h2               pen      -0.002                           
##     h3               pen       0.000                           
##     h4               pen      -0.020                           
##     h5               pen       0.039                           
##     h6               pen       0.082    0.026    0.031    0.132
##     h7               pen       0.370    0.049    0.274    0.465
##     v1               pen       0.862    0.029    0.806    0.918
##     v2               pen       0.881    0.028    0.826    0.937
##     v3               pen       0.852    0.029    0.795    0.909
##     v4               pen       0.853    0.029    0.796    0.910
##     v5               pen       0.814    0.030    0.756    0.872
## 
## Covariances:
##                     Type    Estimate  Std.Err     2.5%    97.5%
##   help ~~                                                      
##     voice           free       0.879    0.011    0.858    0.901
## 
## Variances:
##                     Type    Estimate  Std.Err     2.5%    97.5%
##    .h1              free       0.384    0.021    0.343    0.426
##    .h2              free       0.231    0.014    0.203    0.258
##    .h3              free       0.371    0.021    0.330    0.411
##    .h4              free       0.186    0.012    0.162    0.210
##    .h5              free       0.235    0.014    0.208    0.263
##    .h6              free       0.201    0.012    0.177    0.225
##    .h7              free       0.265    0.015    0.236    0.294
##    .v1              free       0.244    0.015    0.214    0.273
##    .v2              free       0.208    0.013    0.181    0.234
##    .v3              free       0.261    0.016    0.230    0.292
##    .v4              free       0.260    0.016    0.229    0.291
##    .v5              free       0.326    0.019    0.289    0.363
##     help           fixed       1.000             1.000    1.000
##     voice          fixed       1.000             1.000    1.000

The penalty matrix can be inspected and plotted as shown in “plotting-penalty-matrix”.

Implied moments

The implied moments (here, the covariance matrix) can be found via the fitted method.

implied <- fitted(scad.fit)
implied
## $cov
##    h1    h2    h3    h4    h5    h6    h7    v1    v2    v3    v4    v5   
## h1 0.992                                                                  
## h2 0.680 0.993                                                            
## h3 0.614 0.688 0.991                                                      
## h4 0.699 0.783 0.707 0.992                                                
## h5 0.674 0.755 0.682 0.776 0.984                                          
## h6 0.688 0.771 0.696 0.792 0.765 0.983                                    
## h7 0.648 0.726 0.655 0.745 0.723 0.742 0.988                              
## v1 0.591 0.661 0.597 0.676 0.663 0.686 0.703 0.987                        
## v2 0.604 0.676 0.611 0.692 0.678 0.701 0.719 0.760 0.985                  
## v3 0.584 0.654 0.590 0.669 0.656 0.678 0.695 0.735 0.751 0.987            
## v4 0.584 0.654 0.591 0.669 0.656 0.678 0.695 0.735 0.752 0.727 0.987      
## v5 0.558 0.624 0.564 0.639 0.626 0.647 0.663 0.702 0.717 0.693 0.694 0.989

Factor scores

Lastly, the factor scores can be calculated via the penfaPredict function.

fscores <- penfaPredict(scad.fit)
head(fscores)
##             help      voice
## [1,] -0.43671234  0.1423077
## [2,] -0.08680255 -0.3588021
## [3,]  0.56857657  0.6977846
## [4,]  0.10895357  0.6192328
## [5,]  0.37877758  0.8663068
## [6,] -0.63062039 -0.4843055

Mcp

We can fit a penalized factor model with the mcp penalty in a way similar to the scad. By default the shape parameter of the mcp is set to 3. This value can be conveniently modified through the a.mcp argument. See ?penfaOptions for additional details on the possible options.

GBIC.mcp <- sapply(eta.grid, penfa.fixedTun, penalty = "mcp")

optimtun.mcp <- eta.grid[[which.min(GBIC.mcp)]]
optimtun.mcp
## [1] 0.141

The tuning value equal to 0.141 generated the model with the lowest GBIC.

mcp.fit <- penfa(## factor model 
                 model = syntax, 
                 data = ccdata, 
                 std.lv = TRUE, 
                 ## penalization
                 pen.shrink = "mcp", 
                 # optimal tuning
                 eta = list(shrink = c("lambda" = optimtun.mcp), diff = c("none" = 0)),
                 strategy = "fixed", 
                 verbose = FALSE)
summary(mcp.fit)
## penfa 0.1.1 reached convergence
## 
##   Number of observations                                    767
##   Number of groups                                            1
##   Number of observed variables                               12
##   Number of latent factors                                    2
##                                                                
##   Estimator                                                PMLE
##   Optimization method                              trust-region
##   Information                                            fisher
##   Strategy                                                fixed
##   Number of iterations                                       40
##   Number of parameters:                                        
##     Free                                                     13
##     Penalized                                                22
##   Effective degrees of freedom                           26.428
##   GIC                                                 17224.179
##   GBIC                                                17346.872
##                                                                
##   Penalty function:                                            
##     Sparsity                                                mcp
##                                                                
##   Additional tuning parameter                                  
##     mcp                                                       3
##                                                                
##   Optimal tuning parameter:                                    
##     Sparsity                                                   
##      - Factor loadings                                    0.141
##                                                                
## 
## Parameter Estimates:
## 
## Latent Variables:
##                     Type    Estimate  Std.Err     2.5%    97.5%
##   help =~                                                      
##     h1               pen       0.779    0.030    0.720    0.839
##     h2               pen       0.874    0.029    0.818    0.931
##     h3               pen       0.788    0.030    0.729    0.847
##     h4               pen       0.914    0.030    0.854    0.974
##     h5               pen       0.828    0.033    0.762    0.893
##     h6               pen       0.807    0.037    0.735    0.879
##     h7               pen       0.506    0.050    0.408    0.604
##     v1             fixed       0.000             0.000    0.000
##     v2               pen       0.000                           
##     v3               pen       0.000                           
##     v4               pen       0.000                           
##     v5               pen      -0.000                           
##   voice =~                                                     
##     h1             fixed       0.000             0.000    0.000
##     h2               pen      -0.001                           
##     h3               pen       0.000                           
##     h4               pen      -0.018                           
##     h5               pen       0.042                           
##     h6               pen       0.087    0.027    0.035    0.139
##     h7               pen       0.371    0.049    0.275    0.467
##     v1               pen       0.862    0.029    0.806    0.919
##     v2               pen       0.882    0.028    0.826    0.937
##     v3               pen       0.852    0.029    0.795    0.909
##     v4               pen       0.853    0.029    0.796    0.910
##     v5               pen       0.814    0.030    0.756    0.872
## 
## Covariances:
##                     Type    Estimate  Std.Err     2.5%    97.5%
##   help ~~                                                      
##     voice           free       0.879    0.011    0.857    0.900
## 
## Variances:
##                     Type    Estimate  Std.Err     2.5%    97.5%
##    .h1              free       0.384    0.021    0.343    0.426
##    .h2              free       0.231    0.014    0.203    0.258
##    .h3              free       0.370    0.021    0.330    0.411
##    .h4              free       0.186    0.012    0.162    0.210
##    .h5              free       0.236    0.014    0.208    0.263
##    .h6              free       0.201    0.012    0.177    0.226
##    .h7              free       0.265    0.015    0.236    0.294
##    .v1              free       0.244    0.015    0.214    0.273
##    .v2              free       0.208    0.013    0.181    0.234
##    .v3              free       0.261    0.016    0.230    0.292
##    .v4              free       0.260    0.016    0.229    0.291
##    .v5              free       0.326    0.019    0.289    0.363
##     help           fixed       1.000             1.000    1.000
##     voice          fixed       1.000             1.000    1.000

Conclusion

Implementing the above approach in the multiple-group case would imply carrying out grid searches in three dimensions to find the optimal tuning parameter vector. This is clearly not advisable, as it would introduce further complications and possibly new computational problems and instabilities. For this reason, we suggest users to rely on the automatic tuning parameter selection procedure, whose applicability is demonstrated in the vignettes for single (vignette("automatic-tuning-selection")) and multiple-group (“multiple-group-analysis”) penalized models.

R Session

sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] penfa_0.1.1
## 
## loaded via a namespace (and not attached):
##  [1] VineCopula_2.4.2    magic_1.5-9         sass_0.4.0         
##  [4] VGAM_1.1-5          sfsmisc_1.1-11      jsonlite_1.7.2     
##  [7] splines_4.1.0       tmvnsim_1.0-2       distr_2.8.0        
## [10] bslib_0.2.5.1       assertthat_0.2.1    stats4_4.1.0       
## [13] yaml_2.2.1          numDeriv_2016.8-1.1 pillar_1.6.1       
## [16] lattice_0.20-44     startupmsg_0.9.6    glue_1.4.2         
## [19] digest_0.6.27       colorspace_2.0-2    htmltools_0.5.1.1  
## [22] Matrix_1.3-3        survey_4.0          psych_2.1.6        
## [25] pcaPP_1.9-74        pkgconfig_2.0.3     ismev_1.42         
## [28] purrr_0.3.4         GJRM_0.2-4          mvtnorm_1.1-2      
## [31] scales_1.1.1        copula_1.0-1        tibble_3.1.2       
## [34] ADGofTest_0.3       gmp_0.6-2           mgcv_1.8-36        
## [37] generics_0.1.0      ggplot2_3.3.5       ellipsis_0.3.2     
## [40] distrEx_2.8.0       Rmpfr_0.8-4         mnormt_2.0.2       
## [43] survival_3.2-11     magrittr_2.0.1      crayon_1.4.1       
## [46] evaluate_0.14       fansi_0.5.0         nlme_3.1-152       
## [49] MASS_7.3-54         gsl_2.1-6           tools_4.1.0        
## [52] mitools_2.4         lifecycle_1.0.0     pspline_1.0-18     
## [55] matrixStats_0.59.0  stringr_1.4.0       trust_0.1-8        
## [58] munsell_0.5.0       stabledist_0.7-1    scam_1.2-11        
## [61] gamlss.dist_5.3-2   compiler_4.1.0      jquerylib_0.1.4    
## [64] evd_2.3-3           rlang_0.4.11        grid_4.1.0         
## [67] trustOptim_0.8.6.2  rmarkdown_2.9       gtable_0.3.0       
## [70] abind_1.4-5         DBI_1.1.1           R6_2.5.0           
## [73] knitr_1.33          dplyr_1.0.7         utf8_1.2.1         
## [76] stringi_1.6.2       matrixcalc_1.0-4    parallel_4.1.0     
## [79] Rcpp_1.0.6          vctrs_0.3.8         tidyselect_1.1.1   
## [82] xfun_0.24

References

Fan, J., & Li, R. 2001. “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” Journal of the American Statistical Association 96(456), 1348–60. https://doi.org/10.1198/016214501753382273
Fischer, R., Ferreira, M. C., Van Meurs, N. et al. (2019). “Does Organizational Formalization Facilitate Voice and Helping Organizational Citizenship Behaviors? It Depends on (National) Uncertainty Norms.” Journal of International Business Studies, 50(1), 125-134. https://doi.org/10.1057/s41267-017-0132-6
Fischer, R., & Karl, J. A. (2019). “A Primer to (Cross-Cultural) Multi-Group Invariance Testing Possibilities in R.” Frontiers in psychology, 10, 1507. https://doi.org/10.3389/fpsyg.2019.01507
Geminiani, E. (2020). “A Penalized Likelihood-Based Framework for Single and Multiple-Group Factor Analysis Models.” PhD thesis, University of Bologna. http://amsdottorato.unibo.it/9355/
Geminiani, E., Marra, G., & Moustaki, I. (2021). “Single- and Multiple-Group Penalized Factor Analysis: A Trust-Region Algorithm Approach with Integrated Automatic Multiple Tuning Parameter Selection.” Psychometrika, 86(1), 65-95. https://doi.org/10.1007/s11336-021-09751-8