Getting Started with NNS: Classification

Fred Viole

library(NNS)
library(data.table)
require(knitr)
require(rgl)
require(meboot)
require(dtw)

Classification

NNS.reg is a very robust regression technique capable of nonlinear regressions of continuous variables and classification tasks in machine learning problems.

We have extended the NNS.reg applications per the use of an ensemble method of classification in NNS.boost.

One major advantage NNS.boost has over tree based methods is the ability to seamlessly extrapolate beyond the current range of observations.

Splits vs. Partitions

Popular boosting algorithms take a series of weak learning decision tree models, and aggregate their outputs. NNS is also a decision tree of sorts, by partitioning each regressor with respect to the dependent variable. We can directly control the number of “splits” with the NNS.reg(..., order = , ...) parameter.

NNS Partitions

We can see how NNS partitions each regressor by calling the $rhs.partitions output. You will notice that each partition is not an equal interval, nor of equal length, which differentiates NNS from other bandwidth or tree-based techniques.

Higher dependence between a regressor and the dependent variable will allow for a larger number of partitions. This is determined internally with the NNS.dep measure.

NNS.reg(iris[,1:4], iris[,5], residual.plot = FALSE, ncores = 1)$rhs.partitions
##     Sepal.Length Sepal.Width Petal.Length Petal.Width
##  1:          4.3         2.0          1.0         0.1
##  2:          5.0         3.0          1.1         1.0
##  3:          6.0         4.0          1.2         2.0
##  4:          7.0         4.4          1.3         2.5
##  5:          7.9          NA          1.4          NA
##  6:           NA          NA          1.5          NA
##  7:           NA          NA          1.6          NA
##  8:           NA          NA          1.7          NA
##  9:           NA          NA          1.9          NA
## 10:           NA          NA          3.3          NA
## 11:           NA          NA          3.5          NA
## 12:           NA          NA          3.6          NA
## 13:           NA          NA          3.7          NA
## 14:           NA          NA          3.8          NA
## 15:           NA          NA          4.0          NA
## 16:           NA          NA          4.1          NA
## 17:           NA          NA          4.2          NA
## 18:           NA          NA          4.4          NA
## 19:           NA          NA          4.5          NA
## 20:           NA          NA          4.6          NA
## 21:           NA          NA          4.7          NA
## 22:           NA          NA          4.8          NA
## 23:           NA          NA          4.9          NA
## 24:           NA          NA          5.1          NA
## 25:           NA          NA          5.2          NA
## 26:           NA          NA          5.3          NA
## 27:           NA          NA          5.4          NA
## 28:           NA          NA          5.6          NA
## 29:           NA          NA          5.7          NA
## 30:           NA          NA          5.8          NA
## 31:           NA          NA          5.9          NA
## 32:           NA          NA          6.0          NA
## 33:           NA          NA          6.1          NA
## 34:           NA          NA          6.3          NA
## 35:           NA          NA          6.4          NA
## 36:           NA          NA          6.7          NA
## 37:           NA          NA          6.9          NA
##     Sepal.Length Sepal.Width Petal.Length Petal.Width

NNS.boost()

Through resampling of the training set and letting each iterated set of data speak for themselves (while paying extra attention to the residuals throughout), we can test various regressor combinations in these dynamic decision trees…only keeping those combinations that add predictive value. From there we simply aggregate the predictions.

NNS.boost will automatically search for an accuracy threshold from the training set, reporting iterations remaining and level obtained in the console. A plot of the frequency of the learning accuracy on the training set is also provided.

Once a threshold is obtained, NNS.boost will test various feature combinations against different splits of the training set and report back the frequency of each regressor used in the final estimate.

Let’s have a look and see how it works. We use 140 random iris observations as our training set with the 10 holdout observations as our test set. For brevity, we set epochs = 10, learner.trials = 10, folds = 1.

NOTE: Base category of response variable should be 1, not 0 for classification problems when using NNS.boost(..., type = "CLASS").

set.seed(1234)
test.set = sample(150,10)
 
a = NNS.boost(IVs.train = iris[-test.set, 1:4], 
              DV.train = iris[-test.set, 5],
              IVs.test = iris[test.set, 1:4],
              epochs = 10, learner.trials = 10, 
              status = FALSE, balance = TRUE,
              type = "CLASS", folds = 1)

a$results
##  [1] 1 2 3 3 3 3 3 3 2 3
a$feature.weights
##  Petal.Width Petal.Length  Sepal.Width Sepal.Length 
##    0.3333333    0.3333333    0.1666667    0.1666667
mean( a$results == as.numeric(iris[test.set, 5]) )
## [1] 1

A perfect classification.

Cross-Validation Classification Using NNS.stack()

The NNS.stack() routine cross-validates for a given objective function the n.best parameter in the multivariate NNS.reg function as well as the threshold parameter in the dimension reduction NNS.reg version. NNS.stack can be used for classification via NNS.stack(..., type = "CLASS", ...).

For brevity, we set folds = 1.

NOTE: Base category of response variable should be 1, not 0 for classification problems when using NNS.stack(..., type = "CLASS").

b = NNS.stack(IVs.train = iris[-test.set, 1:4], 
              DV.train = iris[-test.set, 5],
              IVs.test = iris[test.set, 1:4],
              type = "CLASS", balance = TRUE,
              ncores = 1, folds = 1)

b
## $OBJfn.reg
## [1] 0.9714286
## 
## $NNS.reg.n.best
## [1] 1
## 
## $probability.threshold
## [1] 0.5
## 
## $OBJfn.dim.red
## [1] 0.8928571
## 
## $NNS.dim.red.threshold
## [1] 0.77
## 
## $reg
##  [1] 1 2 3 3 3 3 3 3 2 3
## 
## $dim.red
##  [1] 1 2 3 3 3 3 3 3 2 3
## 
## $stack
##  [1] 1 2 3 3 3 3 3 3 2 3
mean( b$stack == as.numeric(iris[test.set, 5]) )
## [1] 1

Brief Notes on Other Parameters

References

If the user is so motivated, detailed arguments further examples are provided within the following: