Illustration of mixsqp applied to a small data set, and a large one

Youngseok Kim, Peter Carbonetto and Matthew Stephens

2020-05-14

In this vignette, we illustrate the use of the sequential quadratic programming (SQP) algorithm implemented in mixsqp, and we compare its runtime and accuracy against an interior-point (IP) solver implemented by the MOSEK commercial software (it is called by the “KWDual” function in the REBayes package).

If you do not have the Rmosek and REBayes packages installed on your computer, you may skip over these steps in the vignette.

Environment set-up

Load the mixsqp package.

library(mixsqp)

Next, initialize the sequence of pseudorandom numbers.

set.seed(1)

Generate a small data set

We begin with a small example to show how mixsqp works.

L <- simulatemixdata(1000,20)$L
dim(L)
# [1] 1000   20

This call to simulatemixdata created an \(n \times m\) conditional likelihood matrix for a mixture of zero-centered normals, with \(n = 1000\) and \(m = 20\). By default, simulatemixdata normalizes the rows of the likelihood matrix so that the maximum entry in each row is 1.

Fit mixture model

Now we fit the mixture model using the SQP algorithm:

fit.sqp <- mixsqp(L)
# Running mix-SQP algorithm 0.3-43 on 1000 x 20 matrix
# convergence tol. (SQP):     1.0e-08
# conv. tol. (active-set):    1.0e-10
# zero threshold (solution):  1.0e-08
# zero thresh. (search dir.): 1.0e-14
# l.s. sufficient decrease:   1.0e-02
# step size reduction factor: 7.5e-01
# minimum step size:          1.0e-08
# max. iter (SQP):            1000
# max. iter (active-set):     20
# number of EM iterations:    10
# Computing SVD of 1000 x 20 matrix.
# Matrix is not low-rank; falling back to full matrix.
# iter        objective max(rdual) nnz stepsize max.diff nqp nls
#    1 +6.825854400e-01  -- EM --   20 1.00e+00 3.43e-02  --  --
#    2 +6.608901094e-01  -- EM --   20 1.00e+00 1.12e-02  --  --
#    3 +6.501637569e-01  -- EM --   20 1.00e+00 8.83e-03  --  --
#    4 +6.441429345e-01  -- EM --   20 1.00e+00 7.64e-03  --  --
#    5 +6.405379612e-01  -- EM --   20 1.00e+00 6.44e-03  --  --
#    6 +6.382623445e-01  -- EM --   20 1.00e+00 5.36e-03  --  --
#    7 +6.367520429e-01  -- EM --   20 1.00e+00 4.46e-03  --  --
#    8 +6.357009493e-01  -- EM --   20 1.00e+00 3.75e-03  --  --
#    9 +6.349366492e-01  -- EM --   20 1.00e+00 3.18e-03  --  --
#   10 +6.343584376e-01  -- EM --   20 1.00e+00 2.73e-03  --  --
#    1 +6.339053898e-01 +1.856e-02  20  ------   ------   --  --
#    2 +6.281996199e-01 +1.384e-03   4 1.00e+00 4.36e-01  20   1
#    3 +6.281978243e-01 +8.849e-07   4 1.00e+00 3.56e-03   2   1
#    4 +6.281978243e-01 -1.816e-08   4 1.00e+00 6.59e-06   2   1
# Optimization took 0.01 seconds.
# Convergence criteria met---optimal solution found.

In this example, the SQP algorithm converged to a solution in a small number of iterations.

By default, mixsqp outputs information on its progress. It begins by summarizing the optimization problem and the algorithm settings used. (Since we did not change these settings in the mixsqp call, all the settings shown here are the default settings.)

After that, it outputs, at each iteration, information about the current solution, such as the value of the objective (“objective”) and the number of nonzeros (“nnz”).

The “max(rdual)” column shows the quantity used to assess convergence. It reports the maximum value of the “dual residual”; the SQP solver terminates when the maximum dual residual is less than conv.tol, which by default is \(10^{-8}\). In this example, we see that the dual residual shrinks rapidly toward zero.

Another useful indicator of convergence is the “max.diff” column—it reports the maximum difference between the solution estimates at two successive iterations. We normally expect these differences to shrink as we approach the solution, which is precisely what we see in this example.

This information is also provided in the return value, which we can use, for example, to create a plot of the objective value at each iteration of the SQP algorithm:

numiter <- nrow(fit.sqp$progress)
plot(1:numiter,fit.sqp$progress$objective,type = "b",
     pch = 20,lwd = 2,xlab = "SQP iteration",
     ylab = "objective",xaxp = c(1,numiter,numiter - 1))

To assess the accuracy of the SQP solution, we can compare against the solution computed by the IP algorithm. (If you do not have the REBayes package installed, you can skip this step.)

fit.ip <- mixkwdual(L)

If you run the IP algorithm, you should see that the IP and SQP solutions achieve nearly the same objective value.

cat(sprintf("Objective at SQP solution: %0.16f\n",fit.sqp$value,digits = 16))
cat(sprintf("Objective at IP solution:  %0.16f\n",fit.ip$value,digits = 16))
cat(sprintf("Difference in objectives:  %0.4e\n",fit.sqp$value - fit.ip$value))

Comparing the SQP and IP solvers in a large data set

We observed that the SQP and IP methods achieve nearly the same solution quality in the example above. Here, we explore the computational properties of the SQP and IP algorithms in a larger data set.

As before, we compute the \(n \times m\) likelihood matrix for a mixture of zero-centered normals. This time, we use a finer grid of \(m = 100\) normal densities, as well as many more samples.

L <- simulatemixdata(1e5,100)$L
dim(L)
# [1] 100000    100

Now we fit the model using the SQP algorithm:

timing <- system.time(fit.sqp <- mixsqp(L))
cat(sprintf("Computation took %0.2f seconds\n",timing["elapsed"]))
# Running mix-SQP algorithm 0.3-43 on 100000 x 100 matrix
# convergence tol. (SQP):     1.0e-08
# conv. tol. (active-set):    1.0e-10
# zero threshold (solution):  1.0e-08
# zero thresh. (search dir.): 1.0e-14
# l.s. sufficient decrease:   1.0e-02
# step size reduction factor: 7.5e-01
# minimum step size:          1.0e-08
# max. iter (SQP):            1000
# max. iter (active-set):     20
# number of EM iterations:    10
# Computing SVD of 100000 x 100 matrix.
# SVD computation took 4.22 seconds.
# Rank of matrix is estimated to be 24.
# iter        objective max(rdual) nnz stepsize max.diff nqp nls
#    1 +6.740166246e-01  -- EM --  100 1.00e+00 8.05e-03  --  --
#    2 +6.466234849e-01  -- EM --  100 1.00e+00 2.22e-03  --  --
#    3 +6.355043253e-01  -- EM --  100 1.00e+00 1.58e-03  --  --
#    4 +6.297181394e-01  -- EM --  100 1.00e+00 1.45e-03  --  --
#    5 +6.263178791e-01  -- EM --  100 1.00e+00 1.27e-03  --  --
#    6 +6.241590079e-01  -- EM --  100 1.00e+00 1.09e-03  --  --
#    7 +6.227025320e-01  -- EM --  100 1.00e+00 9.48e-04  --  --
#    8 +6.216680102e-01  -- EM --  100 1.00e+00 8.28e-04  --  --
#    9 +6.208998038e-01  -- EM --  100 1.00e+00 7.31e-04  --  --
#   10 +6.203070944e-01  -- EM --   99 1.00e+00 6.52e-04  --  --
#    1 +6.198345926e-01 +2.488e-02  97  ------   ------   --  --
#    2 +6.194468092e-01 +2.246e-02  77 1.00e+00 2.13e-03  20   1
#    3 +6.191221306e-01 +1.990e-02  57 1.00e+00 3.57e-02  20   1
#    4 +6.188455856e-01 +1.821e-02  37 1.00e+00 4.83e-02  20   1
#    5 +6.186035927e-01 +1.679e-02  17 1.00e+00 9.26e-02  20   1
#    6 +6.152593867e-01 +2.970e-03   4 1.00e+00 4.68e-01  20   1
#    7 +6.150717940e-01 +1.434e-03   5 1.00e+00 5.15e-01  20   1
#    8 +6.150588226e-01 +1.183e-03   6 1.00e+00 1.43e-01  20   1
#    9 +6.150586782e-01 +2.123e-05   7 1.00e+00 8.14e-03  20   1
#   10 +6.150586782e-01 -8.121e-08   7 1.00e+00 1.48e-05  20   1
# Optimization took 3.22 seconds.
# Convergence criteria met---optimal solution found.
# Computation took 9.04 seconds

If you have the REBayes package, you can also run the IP method:

timing <- system.time(fit.ip  <- mixkwdual(L))
cat(sprintf("Computation took %0.2f seconds\n",timing["elapsed"]))

If you run the IP algorithm, you should find that the SQP algorithm was considerably faster than the IP solver, and it converged to a solution with nearly the same objective value as the IP solution.

cat(sprintf("Objective at SQP solution: %0.16f\n",fit.sqp$value,digits = 16))
cat(sprintf("Objective at IP solution:  %0.16f\n",fit.ip$value,digits = 16))
cat(sprintf("Difference in objectives:  %0.4e\n",fit.sqp$value - fit.ip$value))

Session information

This next code chunk gives information about the computing environment used to generate the results contained in this vignette, including the version of R and the packages used.

sessionInfo()
# R version 3.6.2 (2019-12-12)
# Platform: x86_64-apple-darwin15.6.0 (64-bit)
# Running under: macOS Catalina 10.15.4
# 
# Matrix products: default
# BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
# LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
# 
# locale:
# [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
# 
# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     
# 
# other attached packages:
# [1] mixsqp_0.3-43
# 
# loaded via a namespace (and not attached):
#  [1] Rcpp_1.0.3      lattice_0.20-38 Rmosek_9.0.96   digest_0.6.23  
#  [5] grid_3.6.2      magrittr_1.5    evaluate_0.14   rlang_0.4.5    
#  [9] stringi_1.4.3   irlba_2.3.3     Matrix_1.2-18   rmarkdown_2.0  
# [13] tools_3.6.2     stringr_1.4.0   REBayes_1.8     xfun_0.11      
# [17] yaml_2.2.0      compiler_3.6.2  htmltools_0.4.0 knitr_1.26