The ‘lmvar’ package fits a Gaussian linear model. It differs from a classical linear model in that the variance is not constant. Instead, the variance has its own model, comparable to the model for the expected value. The classical linear model is provided by the function ‘lm’ in the package ‘stats’.
Working with the package is a lot like working with ‘lm’. A fit with ‘lmvar’ results in an ‘lmvar’ object, which is a list. Accessor functions are provided to extract the list members, such as the fitted coefficients \(\beta\) and the log-likelihood. Various utility functions such as residuals
to calculate residuals, AIC
to calculate the AIC, fitted
to obtain expected values, standard deviations and confidence intervals, etc., are also provided by the package.
The package is intended for people who run a classical linear model and want to see what happens if the restriction of a constant variance is dropped. Questions in this context are: does the allowance of heteroscedasticity result in a better fit, lower values for the AIC or BIC, smaller prediction errors, etc.?
The package fits the following model. A vector \(Y\) of observations (also called ‘responses’) of length \(n\) is a stochastic vector. It is distributed according to a multivariate Gaussian distribution:
\[\begin{equation} Y \sim \mathcal{N}_n( \mu, \Sigma), \end{equation}\] where \(\mu\) is the vector of expected values and \(\Sigma\) the covariance matrix (also called the ‘variance-covariance matrix’). Just like in the standard linear model, the covariance matrix is taken to be a \(n \times n\) diagonal matrix but contrary to the standard linear model, the diagonal entries need not be all the same: \[\begin{equation} \Sigma_{ij} = \begin{cases} 0 & i \neq j\\ \sigma_i^2 & i=j \end{cases} \end{equation}\]
As in the classical linear model, the vector of expectation values \(\mu\) is given by \[\begin{equation} \mu = X_\mu \beta_\mu \end{equation}\] where \(X_\mu\) is the ‘model matrix’ or ‘design matrix’ for \(\mu\) and \(\beta_\mu\) the parameter vector for \(\mu\). \(X_\mu\) is a \(n \times k_\mu\) matrix and \(\beta_\mu\) a vector of length \(k_\mu\).
Let \(\sigma\) denotes the vector \((\sigma_1, \dots, \sigma_n)\). The model for \(\sigma\) is \[\begin{equation} \log \sigma = X_\sigma \beta_\sigma \end{equation}\] where \(\log \sigma\) stands for the vector \((\log\sigma_1, \dots, \log\sigma_n)\), \(X_\sigma\) is the ‘model matrix’ or ‘design matrix’ for \(\sigma\) and \(\beta_\sigma\) the parameter vector for \(\sigma\). The logarithm is taken to be the ‘natural logarithm’ with base \(e\). The dimensions of \(X_\sigma\) are \(n \times k_\sigma\) and \(\beta_\sigma\) is a vector of length \(k_\sigma\).
The vector of observations \(Y\) and the matrices \(X_\mu\) and \(X_\sigma\) are specified by the user. They must contain real values. The fit returns the maximum-likelihood estimators for \(\beta_\mu\) and \(\beta_\sigma\).
The model for both \(\mu\) and \(\sigma\) contains an intercept term by default. That means that the first column of both matrices is a column in which each matrix-element equals 1. The package will add this column to the user-suppplied matrices to ensure that the intercept term is present. There is no need for a user to include such a column in a user-supplied model-matrix. Intercept terms can be suppressed with the arguments intercept_mu = FALSE
and intercept_sigma = FALSE
of the function lmvar
.
After adding the intercept columns (if not suppressed), the package will check whether the resulting matrices are full rank. If not, columns will be removed from each matrix until it is full rank.
The addition of an intercept column and, possibly, the removal of columns to obtain a full-rank matrix, imply that the actual matrices used in the fit can be different from the user-specified matrices. The matrices that are actually used in the fit are returned as members of the lmvar
object.
Carrying out the fit boils down to solving a set of non-linear equations. By default this is carried out in the background by the function maxNR
from the package ‘maxLik’ but it is possible to use another function from the same package.
More mathematical details about the model can be found in the vignette ‘Math’ which comes with this package. It can be viewed with vignette("Math")
or vignette("Math", package="lmvar")
.
The main function in the package is lmvar
. It carries out a fit and returns an lmvar
object.
The user must specify a vector of observations and two model-matrices when calling lmvar
. They must meet a number of conditions, which are described in detail in the on-line help for lmvar
, to be viewed with the command ?lmvar
. The most important conditions to keep in mind are:
NaN
etc., are not allowed.lmvar
(although one can suppress it), is called (Intercept)
for \(X_\mu\) and (Intercept_s)
for \(X_\sigma\).With each column in \(X_\mu\) corresponds an element in \(\beta_\mu\). The name of that element is the corresponding column name. The same is true for \(X_\sigma\) and \(\beta_\sigma\).
It can happen that lmvar
fails to solve the maximum-likelihood equations and exits with warnings. See here for the options you have in this case.
An lmvar
object is a list whose members are intended to be extracted with the supplied accessor and utility functions. The only members for which no such functions have been implemented are:
y
: the user-supplied vector of observationsX_mu
: the actual, full-rank matrix \(X_\mu\) including the intercept term (if not suppressed) that is used in the fitX_sigma
: the actual, full-rank matrix \(X_\sigma\) including the intercept term (if not suppressed) that is used in the fitslvr_log
: log-output from the function maxNR
. Only added to the lmvar
object when requested.Once lmvar
has run and an lmvar
-object has been created, one can obtain \(\beta_\mu\) and \(\beta_\sigma\) with the function coef
. The function fitted
allows one to obtain \(\mu\) and \(\sigma\). We refer to the package documentation (in particular the package index which can be viewed with help(package = "lmvar")
) for a list of all available functions and function details.
We demonstrate the package with the help of the dataframe cats
which can be found in the MASS
package.
# As example we use the dataset 'cats' from the library 'MASS'.
require(lmvar); require(MASS)
# A plot of the heart weight versus the body weight in the data set
plot(cats$Bwt, cats$Hwt, xlab = "Body weight", ylab = "Heart weight")
We want to regress the cats heart weight ‘Hwt’ onto the body weight ‘Bwt’.
# Create the model matrix. It only contains the body weight. An intercept term will be added by 'lmvar'.
X = model.matrix(~ Bwt - 1, cats)
# Carry out the fit with the same model matrix for mu (the expected heart weight) and for log sigma (the standard deviation)
fit = lmvar(cats$Hwt, X_mu = X, X_sigma = X)
We have now created the object fit
, which is our object of class lmvar
. To obtain a first impression of the fit, we look at the summary
## Call:
## lmvar(y = cats$Hwt, X_mu = X, X_sigma = X)
##
## Number of observations: 144
## Degrees of freedom : 4
##
## Z-scores:
## Min 1Q Median 3Q Max
## -2.4159 -0.7261 -0.0655 0.6703 2.5998
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.012179 0.679820 -0.0179 0.985707
## Bwt 3.904310 0.258964 15.0766 < 2.2e-16 ***
## (Intercept_s) -0.522274 0.337044 -1.5496 0.121244
## Bwt_s 0.315837 0.121843 2.5922 0.009537 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Standard deviations:
## Min 1Q Median 3Q Max
## 1.1156 1.2265 1.3916 1.5422 2.0330
##
## Comparison to model with constant variance (i.e. classical linear model)
## Log likelihood-ratio: 4.069774
## Additional degrees of freedom: 1
## p-value for difference in deviance: 0.00433
The first line shows the call that created fit
. Then, we are told something about the distribution of the standard scores (also called the z-scores).
Next, the summary shows the matrix with the coefficients \(\beta_\mu\) and \(\beta_\sigma\). The coefficients \(\beta_\mu\) are (Intercept)
, and Bwt
. The coefficients \(\beta_\sigma\) are (Intercept_s)
, and Bwt_s
. They are called this way to distinguish them from the coefficients for \(\beta_\mu\). In cases where there is no risk of confusion, the true names of the coefficients \(\beta_\sigma\) will be used, which are (Intercept_s)
and Bwt
.
The matrix with coefficients shows that Bwt
and Bwt_s
are statistically significant at the 5% level, but the intercept terms are not.
The next piece of information gives an impression of the distribution of the standard deviations \(\sigma\).
Finally the model is compared to a classical linear model with the same model matrix \(X_\mu\) but a standard deviation that is the same for all observations. The summary shows the difference in log-likelihood between the two models and the difference in degrees of freedom. Twice the difference in log-likelihood is the difference in deviance, for which a p-value is calculated. The p-value of 0.00433, indicates that the lmvar
fit is a better fit than the classical linear model at the 5% confidence level. I.e., it makes sense to let the standard deviation vary instead of keeping it fixed.
Another useful, high-level check of the fit is to look at a number of diagnostic plots with the command plot (fit)
.
The number of observations in the fit and the log-likelihood are
## [1] 144
## 'log Lik.' -252.991 (df=4)
The rank of the matrix \(X_\mu\) used in the fit is
## [1] 2
The value 2 is correct: the user-supplied matrix had 1 column and lmvar
added an intercept column. The two columns are linearly independent so no column had to be removed.
We run the regression again, but this time without intercept terms
The rank of \(X_\mu\) as used in the fit is now equal to 1:
## [1] 1
The summary no longer shows the intercept terms:
## Call:
## lmvar(y = cats$Hwt, X_mu = X, X_sigma = X, intercept_mu = FALSE,
## intercept_sigma = FALSE)
##
## Number of observations: 144
## Degrees of freedom : 2
##
## Z-scores:
## Min 1Q Median 3Q Max
## -2.2212 -0.6966 -0.0686 0.7168 3.1238
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## Bwt 3.903044 0.044278 88.1489 < 2.2e-16 ***
## Bwt_s 0.134489 0.021302 6.3135 2.728e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Standard deviations:
## Min 1Q Median 3Q Max
## 1.3086 1.3625 1.4378 1.5021 1.6896
The summary overview has left out the comparison with the classical linear model as well. The classical linear model and the model in fit
are no longer nested models because of the absence on an intercept term in \(X_\sigma\).
Let’s see whether the z-scores appear to be correlated with the body weight
sigma = fitted(fit, mu = FALSE)
plot(cats$Bwt, residuals(fit) / sigma, xlab = "Body weight", ylab = "z-score")
abline(0, 0, col = "red")
Let’s also plot the average heart weights versus the body weight, with a 95% confidence interval as error bar.
mu = fitted(fit, sigma = FALSE)
plot(cats$Bwt, mu, xlab = "Body weight", ylab = "Average heart weight", ylim = c(7, 16))
intervals = fitted(fit, interval = "confidence", level = 0.95)
lwr = intervals[, "mu_lwr"]
upr = intervals[, "mu_upr"]
segments(cats$Bwt, lwr, cats$Bwt, upr)
We carry out a classical linear fit, without an intercept term in the model matrix
and we compare AIC and BIC values
## [1] 512.7867
## [1] 518.3905
## [1] 518.7263
## [1] 524.3301
Both AIC and BIC favor the fit with lmvar
over the fit with lm
.
We plot again the expected heart weight versus the body weight, but now we add the predictions of the classical linear fit as red points
plot(cats$Bwt, mu, xlab = "Body weight", ylab = "Average heart weight", ylim = c(7, 16))
segments(cats$Bwt, lwr, cats$Bwt, upr)
points(cats$Bwt, fitted(fit_lm), col = "red")
Te plot shows that the predicted values from the classical linear model are nearly the same as the predictions from lmvar
. We can also plot the fitted standard deviations with their 95% confidence interval:
plot(cats$Bwt, sigma, xlab = "Body weight", ylab = "St dev of heart weight", ylim = c(1, 2))
lwr = intervals[, "sigma_lwr"]
upr = intervals[, "sigma_upr"]
segments(cats$Bwt, lwr, cats$Bwt, upr)
sigma_lm = summary(fit_lm)$sigma
abline(sigma_lm, 0, col = "red")
The red line is the standard deviation that is returned from the classical linear model.
The coefficients \(\beta_\mu\) and \(\beta_\sigma\) that were displayed in the summary-overview, are obtained by
## Bwt Bwt_s
## 3.9030442 0.1344892
The suffix ’_s’ is added to show that ‘Bwt_s’ is the coefficient \(\beta_\sigma\). If we ask for \(\beta_\sigma\) only, the real name of the coefficient is used
## Bwt
## 0.1344892
We conclude this demonstration with the covariance matrix for the combined coefficients \((\beta_\mu, \beta_\sigma)\)
## Bwt Bwt_s
## Bwt 0.00196053 0.0000000000
## Bwt_s 0.00000000 0.0004537699
Hopefully this demonstration has given an idea of how to work with the package. The documentation of the individual functions contains further examples.
We refer to the package index for a list of all available functions. The index can be viewed with help(package="lmvar")
. Various generic functions from the ‘stats’ package work for an ‘lmvar’ object as well. Examples are BIC
and confint
.
The function plot.lmvar
returns a number of diagnostic plots for an ‘lmvar’ object, similar to the function plot.lm
for an ‘lm’ object.
The function plot_qq
produces a QQ-plot for one or two fits. They can be of class ‘lm’ or class ‘lmvar’. It is also possible that one id of class ‘lm’ and the other of class ‘lmvar’.
The function plot_qdis
produces a plot of the distribution of quantiles. Like plot_qq
, one can specify two plots, each of class ‘lm’ or class ‘lmvar’.
The package provides the function fwbw
for a selection of the independent model-variables (also called the ‘predictors’ or ‘covariates’). It searches for an optimal subset of variables by means of a forward / backward-stepping algorithm. The function works on a ‘lmvar’ object but also on a ‘lm’ object. See the function documentation of fwbw
for details.
The package provides the functions cv.lm
and cv.lmvar
for cross-validation of prediction errors or any other function of choice. The former function works on an ‘lm’ object, the latter on a ‘lmvar’ object. See the function documentation for details.
The package ‘lmvar’is concerned mostly with objects of class ’lmvar’. However, it also contains the function lmvar_no_fit
which creates an object of class ‘lmvar_no_fit’. This class is like the class ‘lmvar’ but misses any information that is the result of a model fit.
The class ‘lmvar’ is an extension of the class ‘lmvar_no_fit’. This means that whenever an object of class ‘lmvar_no_fit’ is required, an object of class ‘lmvar’ can be used as well. An example is the function nobs.lmvar_no_fit
supplied by this package. It takes as input an object of class ‘lmvar_no_fit’. Therefore it will also accept an object of class ‘lmvar’.
It can happen that the function lmvar
exits with a warning that it had trouble carrying out the model fit. The warning will be like ‘Last step could not find a value above the current’ or ‘Log-likelihood appears not to be at a maximum!’. This is means that the iterative fitting procedure did not converge in a satisfactory manner. In our experience, this is most likely to happen when the model for \(\sigma\) has many degrees of freedom, i.e., the model matrix \(X_\sigma\) has many columns.
What are your options if this happens? You can try the following strategies.
Check whether the model for \(\sigma\) contains factor levels which occur only a few times, or which occur almost always. Such levels are columns in \(X_\sigma\) where nearly all elements are equal to 0 and a few equal to 1, or nearly all elements are equal to 1 and a few equal to zero. Remove these columns from \(X_\sigma\) and run lmvar
again.
Run lmvar
with the argument control = list(remove_df_sigma_post = TRUE)
. With this option, lmvar
tries to remove degrees of freedom from the model for \(\sigma\) that prevent the fit to converge. See the vignette ‘Math’ for a mathematical background of this option.
Create the model by starting with a model with only 2 degrees of freedom and gradually adding degrees of freedom, avoiding ones that prevent the fit to converge. This can be done by first running lmvar_no_fit
(with the same arguments as lmvar
). The resulting object is then input for the function fwbw
. This function must be run with argument fw = TRUE
. The function fwbw
requires one to select a function that measures the goodness of fit. Reasonable choices for this function are e.g. AIC
or BIC
. The output of fwbw
contains a model-fit which is hopefully one you can work with.
There are several other package that can fit the same model as ‘lmvar’. Below, we mention the ones we are aware of. The ‘lmvar’ package has been developed as a relatively simple next step for users who typically run a linear regression, but want to gain experience with a heterscedastic model. Depending on one’s taste and needs, one might need or prefer another package though.
If we can, we demonstrate the alternatives with the same example we have used to demonstrate ‘lmvar’.
The function remlscore
from the package ‘statmod’ (Giner and Smyth 2016) fits our example as follows.
require(statmod)
# Run the regression including intercept terms
intercept = rep(1, nrow(cats))
fit = remlscore( cats$Hwt, cbind(intercept, X), cbind(intercept, X))
The coefficients \(\beta\) for the expected value can be obtained as fit$beta
. The coefficients for the logarithm of the standard deviation as fit$gamma
. By definition, the latter coefficients are twice the value calculated by lmvar
.
The function crch
from the package ‘crch’ (Messner, Mayr, and Zeileis 2016) fits our example as follows.
require(crch)
# Run the regression including intercept terms
fit = crch(Hwt ~ Bwt | Bwt , data = cats)
All coefficients \(\beta\) can be obtained with coef(fit)
.
The function gam
from the package ‘mgcv’ (Wood 2011) fits our example as follows.
require(mgcv)
# Run the regression including intercept terms
fit = gam(list(Hwt ~ Bwt, ~ Bwt) , data = cats, family = gaulss())
All coefficients \(\beta\) can be obtained with coef(fit)
.
The function gamlss
from the package ‘gamlss’ (Rigby and Stasinopoulos 2005) fits our example as follows.
require(gamlss)
# Run the regression including intercept terms
fit = gamlss(Hwt ~ Bwt, ~ Bwt, data = cats)
The coefficients \(\beta\) for the expected value can be obtained as coef(fit)
. The coefficients for the log of the standard deviation are obtained by coef(fit, what = "sigma")
.
Other functions that allow for a model of the dispersion are, e.g., hglm
in the package hglm
(Ronnegard, Shen, and Alam 2010) and geese
in the package geepack
(Hojsgaard, Halekoh, and Yan 2006). These models are more complicated though, and require a level of expertise not required by lmvar
.
We thank Prof. Dr. Eric Cator for his valuable comments and suggestions. Prof. Dr. Achim Zeileis brought the packages ‘crch, ’mgcv’ and ‘gamlss’ to our attention.
This package uses the package maxLik
(Henningsen and Toomet 2011) to find the maximum likelihood. The package matrixcalc
is used to check properties of the Hessian. The package Matrix
is used to support matrices of class ‘Matrix’.
Giner, Goknur, and Gordon K. Smyth. 2016. “Statmod: Probability Calculations for the Inverse Gaussian Distribution.” R Journal 8 (1): 339–51.
Henningsen, Arne, and Ott Toomet. 2011. “MaxLik: A Package for Maximum Likelihood Estimation in R.” Computational Statistics 26 (3): 443–58. https://doi.org/10.1007/s00180-010-0217-1.
Hojsgaard, Soren, Ulrich Halekoh, and Jun Yan. 2006. “The R Package Geepack for Generalized Estimating Equations.” Journal of Statistical Software 15/2: 1–11.
Messner, Jakob W., Georg J. Mayr, and Achim Zeileis. 2016. “Heteroscedastic Censored and Truncated Regression with crch.” The R Journal 8 (1): 173–81. https://journal.r-project.org/archive/2016-1/messner-mayr-zeileis.pdf.
Rigby, R. A., and D. M. Stasinopoulos. 2005. “Generalized Additive Models for Location, Scale and Shape,(with Discussion).” Applied Statistics 54.3: 507–54.
Ronnegard, Lars, Xia Shen, and Moudud Alam. 2010. “Hglm: A Package for Fitting Hierarchical Generalized Linear Models.” The R Journal 2 (2): 20–28. https://journal.r-project.org/archive/2010-2/RJournal_2010-2_Roennegaard~et~al.pdf.
Wood, S. N. 2011. “Fast Stable Restricted Maximum Likelihood and Marginal Likelihood Estimation of Semiparametric Generalized Linear Models.” Journal of the Royal Statistical Society (B) 73 (1): 3–36.