itdr is a system for estimating a basis of the central and central mean subspaces in regression by using integral transformation methods. This vignette
demonstrate the usage of functions in itdr package over automobile
, Recumbent
and PDB
datasets.
Installation can be done for itdr R package in three ways.
install.packages()
function in R. Then, import itdr
package into working session using library()
function. That is,intall.package()
function in R. Then, import itdr package into working session using library()
function. That is,install_github()
function in R devtools
package as follows.In this section, we describe the functions in itdr package which use Fourier transformation method to estimate sufficient dimension reduction (SDR) subspaces in regression. We only demonstrate the Fourier transformation method. However, by passing the argument method="CM"
to the itdr()
function, convolution transformation method can be obtained.
Before estimating the SDR subspaces, it is required to estimate the dimension (d) of the SDR subspace and tuning parameters sw2
, and st2
. In Section 2.1, the estimation of dimension (d) is demonstrated. The estimation of the tuning parameter sw2
for both subspaces, i.e., for the central subspace (CS) and the central mean subspace (CMS), is explained in Section 2.2.1. Moreover, the estimation of st2
for the central subspace (CS) is explained in Section 2.2.2. Finally, the use of itdr()
function to estimate the central subspace is demonstrated in Section 2.3.
Bootstrap estimation procedure is used to estimate the unknown dimension (d) of sufficient dimension reduction subspaces, for more details see Zhu and Zeng (2006). The d.boots()
function can be used to estimate the dimension (d). Now, let’s estimate the dimension d
of the central subspace of the automobile
dataset, here we use the response variable y, and the predictor variables x as mentioned in Zhu and Zeng (2006). We need to pass the arguments to the d.boots()
function as space="pdf"
to estimate the CS, xdensity="normal"
for assuming normal density for the predictors, and method="FM"
for useing the Fourier transformation method.
#Install package
library(itdr)
# Use dataset available in itdr package
data(automobile)
head(automobile)
#> symboling normalized make fuelType aspiration #doors bodyStyle
#> 1 3 NA alfa-romero gas std two convertible
#> 2 3 NA alfa-romero gas std two convertible
#> 3 1 NA alfa-romero gas std two hatchback
#> 4 2 164 audi gas std four sedan
#> 5 2 164 audi gas std four sedan
#> 6 2 NA audi gas std two sedan
#> driveWheel engineLoc wheelBase length width height curbWeight engineType
#> 1 rwd front 88.6 168.8 64.1 48.8 2548 dohc
#> 2 rwd front 88.6 168.8 64.1 48.8 2548 dohc
#> 3 rwd front 94.5 171.2 65.5 52.4 2823 ohcv
#> 4 fwd front 99.8 176.6 66.2 54.3 2337 ohc
#> 5 4wd front 99.4 176.6 66.4 54.3 2824 ohc
#> 6 fwd front 99.8 177.3 66.3 53.1 2507 ohc
#> #Cylinder engineSize fuelSys bore stroke compession norsepower peak_rpm
#> 1 four 130 mpfi 3.47 2.68 9.0 111 5000
#> 2 four 130 mpfi 3.47 2.68 9.0 111 5000
#> 3 six 152 mpfi 2.68 3.47 9.0 154 5000
#> 4 four 109 mpfi 3.19 3.40 10.0 102 5500
#> 5 five 136 mpfi 3.19 3.40 8.0 115 5500
#> 6 five 136 mpfi 3.19 3.40 8.5 110 5500
#> city_rpm highwayMPG price
#> 1 21 27 13495
#> 2 21 27 16500
#> 3 19 26 16500
#> 4 24 30 13950
#> 5 18 22 17450
#> 6 19 25 15250
automobile.na=na.omit(automobile)
# prepare response and predictor variables
auto_y=log(automobile.na[,26])
auto_xx=automobile.na[,c(10,11,12,13,14,17,19,20,21,22,23,24,25)]
auto_x=scale(auto_xx) # Standardize the predictors
# call to the d.boots() function with required arguments
d_est=d.boots(auto_y,auto_x,Plot=TRUE,space="pdf",xdensity = "normal",method="FM")
#>
|
| | 0%
|
|===== | 8%
|
|=========== | 15%
|
|================ | 23%
|
|====================== | 31%
|
|=========================== | 38%
|
|================================ | 46%
|
|====================================== | 54%
|
|=========================================== | 62%
|
|================================================ | 69%
|
|====================================================== | 77%
|
|=========================================================== | 85%
|
|================================================================= | 92%
|
|======================================================================| 100%
Here, the estimate of the dimension of the central subspace for ‘automobile’ data is 2, i.e., d_hat=2.
There are two tuning parameters that need to be estimated in the process of estimating SDR subspaces using the Fourier method: namely sw2
and st2
. The sw2
required in both the central mean (CMS) and the central subspace (CS). However, the st2
required only in the central subspace. The code in Section 2.2.1 demonstrates the use of function wx()
to estimate the tuning parameter sw2
, and the use of the function wy()
to estimate the tuning parameter st2
is described in Section 2.2.2.
sw2
To estimate the tuning parameter sw2
, we can use wx()
function with the subspace option either space="pdf"
for the CS and space="mean"
for the CMS. During the estimation process, the other parameters are fixed. The following R code chunk demonstrates the estimation of sw2
for the central subspace.
auto_d=2 #The estimated value from Section 2.1
auto_sw2=wx(auto_y,auto_x,auto_d,wx_seq=seq(0.05,1,by=0.01),B=500,space="pdf",method="FM")
#>
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|=== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|====== | 8%
|
|======= | 9%
|
|======= | 10%
|
|======== | 11%
|
|========= | 12%
|
|========= | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 23%
|
|================= | 24%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|======================= | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|========================= | 35%
|
|========================== | 36%
|
|========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================ | 41%
|
|============================= | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 46%
|
|================================= | 47%
|
|================================== | 48%
|
|================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|==================================== | 52%
|
|===================================== | 53%
|
|====================================== | 54%
|
|======================================= | 55%
|
|======================================= | 56%
|
|======================================== | 57%
|
|========================================= | 58%
|
|========================================== | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|============================================ | 62%
|
|============================================ | 64%
|
|============================================= | 65%
|
|============================================== | 66%
|
|=============================================== | 67%
|
|=============================================== | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|================================================== | 72%
|
|=================================================== | 73%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 77%
|
|======================================================= | 78%
|
|======================================================= | 79%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================== | 82%
|
|========================================================== | 83%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================= | 86%
|
|============================================================= | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|===================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 100%
auto_sw2$wx.hat # we get the estimator for sw2 as 0.14
#> [1] 0.09
st2
To estimate the tuning parameter st2
, we can use wy()
function. Here, the other parameters are fixed. Notice that we do not need to specify the space
, because the tuning parameter st2
only required for the central subspace (CS).
auto_d=2 # Estimated value from Section 2.1
auto_st2=wy(auto_y,auto_x,auto_d,wx=0.1,wy_seq=seq(0.1,1,by=0.1),xdensity="normal",method="FM")
#>
|
| | 0%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
auto_st2$wy.hat # we get the estimator for st2=0.9
#> [1] 0.5
h
) of the Gaussian kernel density functionIf the distribution function of the predictor variables is unknown, then we use the Gaussian kernel density estimation to approximate the density function of the predictor variables. However, the bandwidth parameter needs to be estimated when xdensity="kernel"
is used. The wh()
function uses the bootstrap estimator to estimate the bandwidth of the Gaussian kernel density estimation.
h_hat=wh(auto_y,auto_x,auto_d,wx=5,wy=0.1,wh_seq=seq(0.1,2,by=.1),B=50,space = "pdf",method="FM")
#>
|
| | 0%
|
|==== | 5%
|
|======= | 10%
|
|========== | 15%
|
|============== | 20%
|
|================== | 25%
|
|===================== | 30%
|
|======================== | 35%
|
|============================ | 40%
|
|================================ | 45%
|
|=================================== | 50%
|
|====================================== | 55%
|
|========================================== | 60%
|
|============================================== | 65%
|
|================================================= | 70%
|
|==================================================== | 75%
|
|======================================================== | 80%
|
|============================================================ | 85%
|
|=============================================================== | 90%
|
|================================================================== | 95%
|
|======================================================================| 100%
#Bandwidth estimator for Gaussian kernel density estimation for central subspace
h_hat$h.hat #we have the estimator as h_hat=1
#> [1] 0.5
We have described the estimation procedure of the tunning parameters in the Fourier method in Sections 2.1-2.2. Now, we are ready to estimate the SDR subspaces. Zhu and Zeng (2006) used the Fourier method to facilitate the estimation of the SDR subspaces when the predictors are following a multivariate normal distribution. However, when the predictor variables is following an elliptical distribution or more generally when the distribution of the predictors is unknown, the predictors’ distribution function is approximated by using the Gaussian kernel density estimation (Zeng and Zhu, 2010). The itdr()
function can be used to estimate the SDR subspaces under FM
method as follows. Since the default setting of the itdr()
function has method="FM"
, It is optional to specify the method as “FM”.
library(itdr)
data(automobile)
head(automobile)
#> symboling normalized make fuelType aspiration #doors bodyStyle
#> 1 3 NA alfa-romero gas std two convertible
#> 2 3 NA alfa-romero gas std two convertible
#> 3 1 NA alfa-romero gas std two hatchback
#> 4 2 164 audi gas std four sedan
#> 5 2 164 audi gas std four sedan
#> 6 2 NA audi gas std two sedan
#> driveWheel engineLoc wheelBase length width height curbWeight engineType
#> 1 rwd front 88.6 168.8 64.1 48.8 2548 dohc
#> 2 rwd front 88.6 168.8 64.1 48.8 2548 dohc
#> 3 rwd front 94.5 171.2 65.5 52.4 2823 ohcv
#> 4 fwd front 99.8 176.6 66.2 54.3 2337 ohc
#> 5 4wd front 99.4 176.6 66.4 54.3 2824 ohc
#> 6 fwd front 99.8 177.3 66.3 53.1 2507 ohc
#> #Cylinder engineSize fuelSys bore stroke compession norsepower peak_rpm
#> 1 four 130 mpfi 3.47 2.68 9.0 111 5000
#> 2 four 130 mpfi 3.47 2.68 9.0 111 5000
#> 3 six 152 mpfi 2.68 3.47 9.0 154 5000
#> 4 four 109 mpfi 3.19 3.40 10.0 102 5500
#> 5 five 136 mpfi 3.19 3.40 8.0 115 5500
#> 6 five 136 mpfi 3.19 3.40 8.5 110 5500
#> city_rpm highwayMPG price
#> 1 21 27 13495
#> 2 21 27 16500
#> 3 19 26 16500
#> 4 24 30 13950
#> 5 18 22 17450
#> 6 19 25 15250
df=cbind(automobile[,c(26,10,11,12,13,14,17,19,20,21,22,23,24,25)])
dff=as.matrix(df)
automobi=dff[complete.cases(dff),]
d=2; # Estimated value from Section 2.1
wx=.14 # Estimated value from Section 2.2.1
wy=.9 # Estimated value from Section 2.2.2
wh=1.5 # Estimated value from Section 2.2.3
p=13 # Estimated value from Section 2.3
y=automobi[,1]
x=automobi[,c(2:14)]
xt=scale(x)
#Distribution of the predictors is a normal distribution
fit.F_CMS=itdr(y,xt,d,wx,wy,wh,space="pdf",xdensity = "normal",method="FM")
round(fit.F_CMS$eta_hat,2)
#> [,1] [,2]
#> [1,] -0.09 0.01
#> [2,] 0.38 -0.16
#> [3,] -0.08 0.05
#> [4,] -0.11 0.03
#> [5,] -0.70 -0.24
#> [6,] 0.06 0.83
#> [7,] 0.07 -0.14
#> [8,] 0.18 -0.13
#> [9,] -0.17 -0.08
#> [10,] -0.43 -0.26
#> [11,] -0.04 0.06
#> [12,] -0.26 0.29
#> [13,] 0.09 -0.15
#Distribution of the predictors is a unknown (using kernel method)
fit.F_CMS=itdr(y,xt,d,wx,wy,wh,space="pdf",xdensity = "kernel",method="FM")
round(fit.F_CMS$eta_hat,2)
#> [,1] [,2]
#> [1,] -0.07 -0.34
#> [2,] 0.00 -0.04
#> [3,] -0.14 0.10
#> [4,] 0.01 0.13
#> [5,] 0.24 -0.27
#> [6,] -0.90 0.13
#> [7,] 0.19 0.11
#> [8,] 0.17 -0.09
#> [9,] 0.03 -0.32
#> [10,] 0.20 0.48
#> [11,] -0.06 0.04
#> [12,] -0.05 -0.29
#> [13,] 0.04 0.57
The itdr()
function also can be used to estimate the central mean subspace in regression by using iterative Hessian transformation (IHT) method as described in the following R chunk on Recumbent
dataset available in itdr package. Notice that the inputs are the method of estimation (method=iht
), the response vector (y), design matrix of the predictors (x), and the dimension (d) of the CMS. The response and predictors are chosen as same as Cook and Li (2002).
library(itdr)
data("Recumbent")
Recumbent.df=na.omit(Recumbent)
y=Recumbent.df$outcome
X1=log(Recumbent.df$ast)
X2=log(Recumbent.df$ck)
X3=log(Recumbent.df$urea)
p=3
x=matrix(c(X1,X2,X3),ncol=p)
d=2
fit.iht_CMS=itdr(y,x,2,method="iht")
fit.iht_CMS$eta_hat
#> [,1] [,2]
#> [1,] 0.3260269 0.95986216
#> [2,] -0.2395713 -0.27884564
#> [3,] -0.9145010 0.03016189
In this section, we demonstrate the functions in itdr package, which related to the Fourier transformation approach on inverse dimension reduction. In Section 4.1, we demonstrate the use of function to estimate the dimension of the central subspace using the Fourier transformation approach on inverse dimension reduction. The estimation of the CS is described in Section 4.2.
The estimation of the dimension of the CS can be achieved using d.test()
function which gives outputs of three different p-values for three different test statistics; Weighted Chi-square test statistic (Weng and Yin, 2018), Scaled test statistic (Bentler and Xie, 2000), and Adjusted test statistic (Bentler and Xie, 2000). Suppose m
is the candidate dimension of the CS to be tested (H_0: d=m)
, then, the following R code shows the testing a candidate value m
(<p) for dimension of the CS of the planning database (PDB).
library(itdr)
data(PDB)
colnames(PDB)=NULL
p=15
#select predictor vecotr (y) and response variables (X) according to Weng and Weng and Yin, (2018).
df=PDB[,c(79,73,77,103,112,115,124,130,132,145,149,151,153,155,167,169)]
dff=as.matrix(df)
#remove the NA rows
planingdb=dff[complete.cases(dff),]
y=planingdb[,1] #n-dimensionl response vector
x=planingdb[,c(2:(p+1))] # raw desing matrix
x=x+0.5
# desing matrix after tranformations
xt=cbind(x[,1]^(.33),x[,2]^(.33),x[,3]^(.57),x[,4]^(.33),x[,5]^(.4),
x[,6]^(.5),x[,7]^(.33),x[,8]^(.16),x[,9]^(.27),x[,10]^(.5),
x[,11]^(.5),x[,12]^(.33),x[,13]^(.06),x[,14]^(.15),x[,15]^(.1))
m=1
W=sapply(50,rnorm)
#run the hypothsis tests
d.test(y,x,m)
#> Hypothesis Tests for selecting sufficient dimension (d)
#> Null: d=m vs Alternative: d>m
#>
#> Test W.Ch.Sq Scaled Adjusted
#> p-value 0.96 1 0.9108079
After selecting the dimension of the CS as described in Section 4.1, then, an estimator for the CS can be obtained by using invFM()
function. The following R chunk shows the use of invFM()
to estimate the CS for planning database (PDB).
library(itdr)
data(PDB)
colnames(PDB)=NULL
p=15
#select predictor vecotr (y) and response variables (X) according to Weng and Weng and Yin, (2018).
df=PDB[,c(79,73,77,103,112,115,124,130,132,145,149,151,153,155,167,169)]
dff=as.matrix(df)
#remove the NA rows
planingdb=dff[complete.cases(dff),]
y=planingdb[,1] #n-dimensionl response vector
x=planingdb[,c(2:(p+1))] # raw desing matrix
x=x+0.5
# desing matrix after tranformations give in Weng and Yin, (2018).
xt=cbind(x[,1]^(.33),x[,2]^(.33),x[,3]^(.57),x[,4]^(.33),x[,5]^(.4),
x[,6]^(.5),x[,7]^(.33),x[,8]^(.16),x[,9]^(.27),x[,10]^(.5),
x[,11]^(.5),x[,12]^(.33),x[,13]^(.06),x[,14]^(.15),x[,15]^(.1))
W=sapply(50,rnorm)
d=1 # estimated dimension of the CS from Section 4.1
betahat <-invFM(xt,y,d,W,F)$beta # estimated basis
betahat
#> [1] -0.240992923 0.127511993 -0.446480946 -0.201679808 -0.130508082
#> [6] 0.225092170 -0.010737950 0.507992150 -0.500482012 0.042780096
#> [11] -0.132329004 -0.109988190 0.229948626 0.004728476 0.157966683
The codes for the Fourier transformation and the convolution transformation methods are adapted from the codes provided by Zhu and Zeng (2006). Moreover, those for the elliptically contoured distributed variables and the kernel density estimation methods are essentially a modification of the program provided by Zeng and Zhu (2010). The code for Fourier transforms approach for the inverse dimension reduction method is adapted from the code provided by Weng and Yin (2018).
Bentler, P.M., and Xie, J. (2000). Corrections to Test Statistics in Principal Hessian Directions. Statistics and Probability Letters. 47, 381-389.
Cook R. D., and Li, B., (2002). Dimension Reduction for Conditional Mean in Regression. The Annals of Statitics, 30, 455-474.
Weng J. and Yin X. (2018). Fourier Transform Approach for Inverse Dimension Reduction Method. Journal of Nonparametric Statistics. 30, 4, 1029-0311.
Zeng P. and Zhu Y. (2010). An Integral Transform Method for Estimating the Central Mean and Central Subspaces. Journal of Multivariate Analysis. 101, 271–290.
Zhu Y. and Zeng P. (2006). Fourier Methods for Estimating the Central Subspace and Central Mean Subspace in Regression. Journal of the American Statistical Association. 101, 1638–1651.