This vignette explains briefly how to use the function adam()
and the related auto.adam()
in smooth
package. It does not aim at covering all aspects of the function, but focuses on the main ones.
ADAM is Augmented Dynamic Adaptive Model. It is a model that underlies ETS, ARIMA and regression, connecting them in a unified framework. The underlying model for ADAM is a Single Source of Error state space model, which is explained in detail separately in an online textbook.
The main philosophy of adam()
function is to be agnostic of the provided data. This means that it will work with ts
, msts
, zoo
, xts
, data.frame
, numeric
and other classes of data. The specification of seasonality in the model is done using a separate parameter lags
, so you are not obliged to transform the existing data to something specific, and can use it as is. If you provide a matrix
, or a data.frame
, or a data.table
, or any other multivariate structure, then the function will use the first column for the response variable and the others for the explanatory ones. One thing that is currently assumed in the function is that the data is measured at a regular frequency. If this is not the case, you will need to introduce missing values manually.
In order to run the experiments in this vignette, we need to load the following packages:
require(greybox)
require(smooth)
First and foremost, ADAM implements ETS model, although in a more flexible way than (Hyndman et al. 2008): it supports different distributions for the error term, which are regulated via distribution
parameter. By default, the additive error model relies on Normal distribution, while the multiplicative error one assumes Inverse Gaussian. If you want to reproduce the classical ETS, you would need to specify distribution="dnorm"
. Here is an example of ADAM ETS(MMM) with Normal distribution on AirPassengers
data:
<- adam(AirPassengers, "MMM", lags=c(1,12), distribution="dnorm",
testModel h=12, holdout=TRUE)
summary(testModel)
#>
#> Model estimated using adam() function: ETS(MMM)
#> Response variable: AirPassengers
#> Distribution used in the estimation: Normal
#> Loss function type: likelihood; Loss function value: 468.9192
#> Coefficients:
#> Estimate Std. Error Lower 2.5% Upper 97.5%
#> alpha 0.8451 0.0844 0.6780 1.0000 *
#> beta 0.0205 0.0265 0.0000 0.0727
#> gamma 0.0000 0.0373 0.0000 0.0736
#> level 120.2667 14.4276 91.6885 148.7758 *
#> trend 1.0017 0.0100 0.9819 1.0216 *
#> seasonal_1 0.9132 0.0078 0.8987 0.9365 *
#> seasonal_2 0.8996 0.0082 0.8851 0.9229 *
#> seasonal_3 1.0301 0.0096 1.0156 1.0533 *
#> seasonal_4 0.9866 0.0084 0.9721 1.0098 *
#> seasonal_5 0.9852 0.0073 0.9707 1.0085 *
#> seasonal_6 1.1164 0.0095 1.1019 1.1397 *
#> seasonal_7 1.2328 0.0118 1.2183 1.2561 *
#> seasonal_8 1.2262 0.0109 1.2117 1.2495 *
#> seasonal_9 1.0671 0.0093 1.0527 1.0904 *
#> seasonal_10 0.9279 0.0090 0.9134 0.9512 *
#> seasonal_11 0.8058 0.0078 0.7914 0.8291 *
#>
#> Error standard deviation: 0.0376
#> Sample size: 132
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 115
#> Information criteria:
#> AIC AICc BIC BICc
#> 971.8385 977.2069 1020.8461 1033.9526
plot(forecast(testModel,h=12,interval="prediction"))
You might notice that the summary contains more than what is reported by other smooth
functions. This one also produces standard errors for the estimated parameters based on Fisher Information calculation. Note that this is computationally expensive, so if you have a model with more than 30 variables, the calculation of standard errors might take plenty of time. As for the default print()
method, it will produce a shorter summary from the model, without the standard errors (similar to what es()
does):
testModel#> Time elapsed: 0.17 seconds
#> Model estimated using adam() function: ETS(MMM)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 468.9192
#> Persistence vector g:
#> alpha beta gamma
#> 0.8451 0.0205 0.0000
#>
#> Sample size: 132
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 115
#> Information criteria:
#> AIC AICc BIC BICc
#> 971.8385 977.2069 1020.8461 1033.9526
#>
#> Forecast errors:
#> ME: -4.532; MAE: 15.668; RMSE: 21.652
#> sCE: -20.72%; Asymmetry: -12.1%; sMAE: 5.969%; sMSE: 0.68%
#> MASE: 0.651; RMSSE: 0.691; rMAE: 0.206; rRMSE: 0.21
Also, note that the prediction interval in case of multiplicative error models are approximate. It is advisable to use simulations instead (which is slower, but more accurate):
plot(forecast(testModel,h=18,interval="simulated"))
If you want to do the residuals diagnostics, then it is recommended to use plot
function, something like this (you can select, which of the plots to produce):
par(mfcol=c(3,4))
plot(testModel,which=c(1:11))
par(mfcol=c(1,1))
plot(testModel,which=12)
By default ADAM will estimate models via maximising likelihood function. But there is also a parameter loss
, which allows selecting from a list of already implemented loss functions (again, see documentation for adam()
for the full list) or using a function written by a user. Here is how to do the latter on the example of BJsales
:
<- function(actual, fitted, B){
lossFunction return(sum(abs(actual-fitted)^3))
}<- adam(BJsales, "AAN", silent=FALSE, loss=lossFunction,
testModel h=12, holdout=TRUE)
testModel#> Time elapsed: 0.02 seconds
#> Model estimated using adam() function: ETS(AAN)
#> Distribution assumed in the model: Normal
#> Loss function type: custom; Loss function value: 599.2829
#> Persistence vector g:
#> alpha beta
#> 1.0000 0.2253
#>
#> Sample size: 138
#> Number of estimated parameters: 4
#> Number of degrees of freedom: 134
#> Information criteria are unavailable for the chosen loss & distribution.
#>
#> Forecast errors:
#> ME: 3.02; MAE: 3.133; RMSE: 3.871
#> sCE: 15.942%; Asymmetry: 91.8%; sMAE: 1.378%; sMSE: 0.029%
#> MASE: 2.63; RMSSE: 2.523; rMAE: 1.011; rRMSE: 1.01
Note that you need to have parameters actual, fitted and B in the function, which correspond to the vector of actual values, vector of fitted values on each iteration and a vector of the optimised parameters.
loss
and distribution
parameters are independent, so in the example above, we have assumed that the error term follows Normal distribution, but we have estimated its parameters using a non-conventional loss because we can. Some of distributions assume that there is an additional parameter, which can either be estimated or provided by user. These include Asymmetric Laplace (distribution="dalaplace"
) with alpha
, Generalised Normal and Log-Generalised normal (distribution=c("gnorm","dlgnorm")
) with shape
and Student’s T (distribution="dt"
) with nu
:
<- adam(BJsales, "MMN", silent=FALSE, distribution="dgnorm", shape=3,
testModel h=12, holdout=TRUE)
The model selection in ADAM ETS relies on information criteria and works correctly only for the loss="likelihood"
. There are several options, how to select the model, see them in the description of the function: ?adam()
. The default one uses branch-and-bound algorithm, similar to the one used in es()
, but only considers additive trend models (the multiplicative trend ones are less stable and need more attention from a forecaster):
<- adam(AirPassengers, "ZXZ", lags=c(1,12), silent=FALSE,
testModel h=12, holdout=TRUE)
#> Forming the pool of models based on... ANN , ANA , MNM , MAM , Estimation progress: 71 %86 %100 %... Done!
testModel#> Time elapsed: 0.61 seconds
#> Model estimated using adam() function: ETS(MAM)
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 467.2981
#> Persistence vector g:
#> alpha beta gamma
#> 0.7691 0.0053 0.0000
#>
#> Sample size: 132
#> Number of estimated parameters: 17
#> Number of degrees of freedom: 115
#> Information criteria:
#> AIC AICc BIC BICc
#> 968.5961 973.9646 1017.6038 1030.7102
#>
#> Forecast errors:
#> ME: 9.537; MAE: 20.784; RMSE: 26.106
#> sCE: 43.598%; Asymmetry: 64.8%; sMAE: 7.918%; sMSE: 0.989%
#> MASE: 0.863; RMSSE: 0.833; rMAE: 0.273; rRMSE: 0.254
Note that the function produces point forecasts if h>0
, but it won’t generate prediction interval. This is why you need to use forecast()
method (as shown in the first example in this vignette).
Similarly to es()
, function supports combination of models, but it saves all the tested models in the output for a potential reuse. Here how it works:
<- adam(AirPassengers, "CXC", lags=c(1,12),
testModel h=12, holdout=TRUE)
<- forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95))
testForecast
testForecast#> Point forecast Lower bound (5%) Lower bound (2.5%) Upper bound (95%)
#> Jan 1960 412.6058 389.5461 385.2590 436.2204
#> Feb 1960 407.6846 378.6537 373.3025 437.6114
#> Mar 1960 469.9502 429.7489 422.3990 511.6522
#> Apr 1960 453.0288 410.9541 403.2942 496.8145
#> May 1960 453.8379 410.5820 402.7183 498.9019
#> Jun 1960 515.8291 464.1731 454.8091 569.7606
#> Jul 1960 572.4863 512.5248 501.6851 635.2188
#> Aug 1960 571.1840 509.5790 498.4632 635.7280
#> Sep 1960 498.9227 444.4206 434.5951 556.0628
#> Oct 1960 436.6212 387.9020 379.1317 487.7543
#> Nov 1960 380.9144 337.6242 329.8415 426.3939
#> Dec 1960 429.4806 378.9211 369.8546 482.6981
#> Jan 1961 437.0145 382.4884 372.7527 494.5912
#> Feb 1961 431.6841 374.2163 364.0079 492.5977
#> Mar 1961 497.4800 426.3311 413.7697 573.2358
#> Apr 1961 479.4384 407.0745 394.3622 556.7686
#> May 1961 480.1675 406.1294 393.1512 559.4118
#> Jun 1961 545.6108 459.5264 444.4733 637.9118
#> Upper bound (97.5%)
#> Jan 1960 440.8769
#> Feb 1960 443.5593
#> Mar 1960 520.0014
#> Apr 1960 505.6139
#> May 1960 507.9697
#> Jun 1960 580.6399
#> Jul 1960 647.9038
#> Aug 1960 648.8008
#> Sep 1960 567.6449
#> Oct 1960 498.1318
#> Nov 1960 435.6344
#> Dec 1960 493.5344
#> Jan 1961 506.3581
#> Feb 1961 505.1003
#> Mar 1961 588.8641
#> Apr 1961 572.7870
#> May 1961 575.8558
#> Jun 1961 657.1031
plot(testForecast)
Yes, now we support vectors for the levels in case you want to produce several. In fact, we also support side for prediction interval, so you can extract specific quantiles without a hustle:
forecast(testModel,h=18,interval="semiparametric", level=c(0.9,0.95,0.99), side="upper")
#> Point forecast Upper bound (90%) Upper bound (95%) Upper bound (99%)
#> Jan 1960 412.6058 430.8923 436.2204 446.3325
#> Feb 1960 407.6846 430.8198 437.6114 450.5419
#> Mar 1960 469.9502 502.1368 511.6522 529.8219
#> Apr 1960 453.0288 486.7958 496.8145 515.9738
#> May 1960 453.8379 488.5811 498.9019 518.6490
#> Jun 1960 515.8291 557.3860 569.7606 593.4610
#> Jul 1960 572.4863 620.7993 635.2188 662.8621
#> Aug 1960 571.1840 620.8741 635.7280 664.2228
#> Sep 1960 498.9227 542.9053 556.0628 581.3109
#> Oct 1960 436.6212 475.9690 487.7543 510.3805
#> Nov 1960 380.9144 415.9029 426.3939 446.5443
#> Dec 1960 429.4806 470.4024 482.6981 506.3355
#> Jan 1961 437.0145 481.2521 494.5912 520.2715
#> Feb 1961 431.6841 478.4404 492.5977 519.8999
#> Mar 1961 497.4800 555.5623 573.2358 607.3874
#> Apr 1961 479.4384 538.6731 556.7686 591.7922
#> May 1961 480.1675 540.8440 559.4118 595.3748
#> Jun 1961 545.6108 616.2529 637.9118 679.8945
A brand new thing in the function is the possibility to use several frequencies (double / triple / quadruple / … seasonal models). In order to show how it works, we will generate an artificial time series, inspired by half-hourly electricity demand using sim.gum()
function:
<- c(1,1,1)
ordersGUM <- c(1,48,336)
lagsGUM <- -25381.7
initialGUM1 <- c(23955.09, 24248.75, 24848.54, 25012.63, 24634.14, 24548.22, 24544.63, 24572.77,
initialGUM2 24498.33, 24250.94, 24545.44, 25005.92, 26164.65, 27038.55, 28262.16, 28619.83,
28892.19, 28575.07, 28837.87, 28695.12, 28623.02, 28679.42, 28682.16, 28683.40,
28647.97, 28374.42, 28261.56, 28199.69, 28341.69, 28314.12, 28252.46, 28491.20,
28647.98, 28761.28, 28560.11, 28059.95, 27719.22, 27530.23, 27315.47, 27028.83,
26933.75, 26961.91, 27372.44, 27362.18, 27271.31, 26365.97, 25570.88, 25058.01)
<- c(23920.16, 23026.43, 22812.23, 23169.52, 23332.56, 23129.27, 22941.20, 22692.40,
initialGUM3 22607.53, 22427.79, 22227.64, 22580.72, 23871.99, 25758.34, 28092.21, 30220.46,
31786.51, 32699.80, 33225.72, 33788.82, 33892.25, 34112.97, 34231.06, 34449.53,
34423.61, 34333.93, 34085.28, 33948.46, 33791.81, 33736.17, 33536.61, 33633.48,
33798.09, 33918.13, 33871.41, 33403.75, 32706.46, 31929.96, 31400.48, 30798.24,
29958.04, 30020.36, 29822.62, 30414.88, 30100.74, 29833.49, 28302.29, 26906.72,
26378.64, 25382.11, 25108.30, 25407.07, 25469.06, 25291.89, 25054.11, 24802.21,
24681.89, 24366.97, 24134.74, 24304.08, 25253.99, 26950.23, 29080.48, 31076.33,
32453.20, 33232.81, 33661.61, 33991.21, 34017.02, 34164.47, 34398.01, 34655.21,
34746.83, 34596.60, 34396.54, 34236.31, 34153.32, 34102.62, 33970.92, 34016.13,
34237.27, 34430.08, 34379.39, 33944.06, 33154.67, 32418.62, 31781.90, 31208.69,
30662.59, 30230.67, 30062.80, 30421.11, 30710.54, 30239.27, 28949.56, 27506.96,
26891.75, 25946.24, 25599.88, 25921.47, 26023.51, 25826.29, 25548.72, 25405.78,
25210.45, 25046.38, 24759.76, 24957.54, 25815.10, 27568.98, 29765.24, 31728.25,
32987.51, 33633.74, 34021.09, 34407.19, 34464.65, 34540.67, 34644.56, 34756.59,
34743.81, 34630.05, 34506.39, 34319.61, 34110.96, 33961.19, 33876.04, 33969.95,
34220.96, 34444.66, 34474.57, 34018.83, 33307.40, 32718.90, 32115.27, 31663.53,
30903.82, 31013.83, 31025.04, 31106.81, 30681.74, 30245.70, 29055.49, 27582.68,
26974.67, 25993.83, 25701.93, 25940.87, 26098.63, 25771.85, 25468.41, 25315.74,
25131.87, 24913.15, 24641.53, 24807.15, 25760.85, 27386.39, 29570.03, 31634.00,
32911.26, 33603.94, 34020.90, 34297.65, 34308.37, 34504.71, 34586.78, 34725.81,
34765.47, 34619.92, 34478.54, 34285.00, 34071.90, 33986.48, 33756.85, 33799.37,
33987.95, 34047.32, 33924.48, 33580.82, 32905.87, 32293.86, 31670.02, 31092.57,
30639.73, 30245.42, 30281.61, 30484.33, 30349.51, 29889.23, 28570.31, 27185.55,
26521.85, 25543.84, 25187.82, 25371.59, 25410.07, 25077.67, 24741.93, 24554.62,
24427.19, 24127.21, 23887.55, 24028.40, 24981.34, 26652.32, 28808.00, 30847.09,
32304.13, 33059.02, 33562.51, 33878.96, 33976.68, 34172.61, 34274.50, 34328.71,
34370.12, 34095.69, 33797.46, 33522.96, 33169.94, 32883.32, 32586.24, 32380.84,
32425.30, 32532.69, 32444.24, 32132.49, 31582.39, 30926.58, 30347.73, 29518.04,
29070.95, 28586.20, 28416.94, 28598.76, 28529.75, 28424.68, 27588.76, 26604.13,
26101.63, 25003.82, 24576.66, 24634.66, 24586.21, 24224.92, 23858.42, 23577.32,
23272.28, 22772.00, 22215.13, 21987.29, 21948.95, 22310.79, 22853.79, 24226.06,
25772.55, 27266.27, 28045.65, 28606.14, 28793.51, 28755.83, 28613.74, 28376.47,
27900.76, 27682.75, 27089.10, 26481.80, 26062.94, 25717.46, 25500.27, 25171.05,
25223.12, 25634.63, 26306.31, 26822.46, 26787.57, 26571.18, 26405.21, 26148.41,
25704.47, 25473.10, 25265.97, 26006.94, 26408.68, 26592.04, 26224.64, 25407.27,
25090.35, 23930.21, 23534.13, 23585.75, 23556.93, 23230.25, 22880.24, 22525.52,
22236.71, 21715.08, 21051.17, 20689.40, 20099.18, 19939.71, 19722.69, 20421.58,
21542.03, 22962.69, 23848.69, 24958.84, 25938.72, 26316.56, 26742.61, 26990.79,
27116.94, 27168.78, 26464.41, 25703.23, 25103.56, 24891.27, 24715.27, 24436.51,
24327.31, 24473.02, 24893.89, 25304.13, 25591.77, 25653.00, 25897.55, 25859.32,
25918.32, 25984.63, 26232.01, 26810.86, 27209.70, 26863.50, 25734.54, 24456.96)
<- sim.gum(orders=ordersGUM, lags=lagsGUM, nsim=1, frequency=336, obs=3360,
y measurement=rep(1,3), transition=diag(3), persistence=c(0.045,0.162,0.375),
initial=cbind(initialGUM1,initialGUM2,initialGUM3))$data
We can then apply ADAM to this data:
<- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
testModel silent=FALSE, h=336, holdout=TRUE)
testModel#> Time elapsed: 0.56 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 22092.15
#> Persistence vector g:
#> alpha beta gamma1 gamma2
#> 0.9813 0.0000 0.0187 0.0186
#> Damping parameter: 0.9885
#> Sample size: 3024
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 3018
#> Information criteria:
#> AIC AICc BIC BICc
#> 44196.29 44196.32 44232.38 44232.49
#>
#> Forecast errors:
#> ME: 645.686; MAE: 969.856; RMSE: 1182.269
#> sCE: 714.349%; Asymmetry: 71%; sMAE: 3.193%; sMSE: 0.152%
#> MASE: 1.292; RMSSE: 1.143; rMAE: 0.138; rRMSE: 0.137
Note that the more lags you have, the more initial seasonal components the function will need to estimate, which is a difficult task. This is why we used initial="backcasting"
in the example above - this speeds up the estimation by reducing the number of parameters to estimate. Still, the optimiser might not get close to the optimal value, so we can help it. First, we can give more time for the calculation, increasing the number of iterations via maxeval
(the default value is 40 iterations for each estimated parameter, e.g. \(40 \times 5 = 200\) in our case):
<- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
testModel silent=FALSE, h=336, holdout=TRUE, maxeval=10000)
testModel#> Time elapsed: 3.98 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 19579.77
#> Persistence vector g:
#> alpha beta gamma1 gamma2
#> 0.0363 0.0000 0.1545 0.3243
#> Damping parameter: 0.7131
#> Sample size: 3024
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 3018
#> Information criteria:
#> AIC AICc BIC BICc
#> 39171.54 39171.57 39207.63 39207.74
#>
#> Forecast errors:
#> ME: 77.211; MAE: 144.088; RMSE: 181.259
#> sCE: 85.421%; Asymmetry: 53%; sMAE: 0.474%; sMSE: 0.004%
#> MASE: 0.192; RMSSE: 0.175; rMAE: 0.02; rRMSE: 0.021
This will take more time, but will typically lead to more refined parameters. You can control other parameters of the optimiser as well, such as algorithm
, xtol_rel
, print_level
and others, which are explained in the documentation for nloptr
function from nloptr package (run nloptr.print.options()
for details). Second, we can give a different set of initial parameters for the optimiser, have a look at what the function saves:
$B testModel
and use this as a starting point for the reestimation (e.g. with a different algorithm):
<- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
testModel silent=FALSE, h=336, holdout=TRUE, B=testModel$B)
testModel#> Time elapsed: 0.73 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 19579.77
#> Persistence vector g:
#> alpha beta gamma1 gamma2
#> 0.0363 0.0000 0.1545 0.3243
#> Damping parameter: 0.7525
#> Sample size: 3024
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 3018
#> Information criteria:
#> AIC AICc BIC BICc
#> 39171.54 39171.57 39207.63 39207.74
#>
#> Forecast errors:
#> ME: 77.236; MAE: 144.099; RMSE: 181.272
#> sCE: 85.45%; Asymmetry: 53%; sMAE: 0.474%; sMSE: 0.004%
#> MASE: 0.192; RMSSE: 0.175; rMAE: 0.02; rRMSE: 0.021
If you are ready to wait, you can change the initialisation to the initial="optimal"
, which in our case will take much more time because of the number of estimated parameters - 389 for the chosen model. The estimation process in this case might take 20 - 30 times more than in the example above.
In addition, you can specify some parts of the initial state vector or some parts of the persistence vector, here is an example:
<- adam(y, "MMdM", lags=c(1,48,336), initial="backcasting",
testModel silent=TRUE, h=336, holdout=TRUE, persistence=list(beta=0.1))
testModel#> Time elapsed: 0.45 seconds
#> Model estimated using adam() function: ETS(MMdM)[48, 336]
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 21881.47
#> Persistence vector g:
#> alpha beta gamma1 gamma2
#> 0.9671 0.1000 0.0318 0.0329
#> Damping parameter: 0.6348
#> Sample size: 3024
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 3019
#> Information criteria:
#> AIC AICc BIC BICc
#> 43772.94 43772.96 43803.01 43803.09
#>
#> Forecast errors:
#> ME: 743.04; MAE: 1026.529; RMSE: 1241.774
#> sCE: 822.056%; Asymmetry: 76.4%; sMAE: 3.38%; sMSE: 0.167%
#> MASE: 1.367; RMSSE: 1.2; rMAE: 0.146; rRMSE: 0.144
The function also handles intermittent data (the data with zeroes) and the data with missing values. This is partially covered in the vignette on the oes() function. Here is a simple example:
<- adam(rpois(120,0.5), "MNN", silent=FALSE, h=12, holdout=TRUE,
testModel occurrence="odds-ratio")
testModel#> Time elapsed: 0.04 seconds
#> Model estimated using adam() function: iETS(MNN)[O]
#> Occurrence model type: Odds ratio
#> Distribution assumed in the model: Mixture of Bernoulli and Gamma
#> Loss function type: likelihood; Loss function value: 46.015
#> Persistence vector g:
#> alpha
#> 0
#>
#> Sample size: 108
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 103
#> Information criteria:
#> AIC AICc BIC BICc
#> 242.1244 242.3552 255.5351 246.7111
#>
#> Forecast errors:
#> Asymmetry: -70.778%; sMSE: 12.82%; rRMSE: 1.108; sPIS: 1424.024%; sCE: -243.44%
Finally, adam()
is faster than es()
function, because its code is more efficient and it uses a different optimisation algorithm with more finely tuned parameters by default. Let’s compare:
<- adam(AirPassengers, "CCC",
adamModel h=12, holdout=TRUE)
<- es(AirPassengers, "CCC",
esModel h=12, holdout=TRUE)
"adam:"
#> [1] "adam:"
adamModel#> Time elapsed: 2.13 seconds
#> Model estimated: ETS(CCC)
#> Loss function type: likelihood
#>
#> Number of models combined: 30
#> Sample size: 132
#> Average number of estimated parameters: 21.8615
#> Average number of degrees of freedom: 110.1385
#>
#> Forecast errors:
#> ME: -1.799; MAE: 14.398; RMSE: 20.565
#> sCE: -8.226%; sMAE: 5.485%; sMSE: 0.614%
#> MASE: 0.598; RMSSE: 0.656; rMAE: 0.189; rRMSE: 0.2
"es():"
#> [1] "es():"
esModel#> Time elapsed: 4.25 seconds
#> Model estimated: ETS(CCC)
#> Initial values were optimised.
#>
#> Loss function type: likelihood
#> Error standard deviation: 8.2722
#> Sample size: 132
#> Information criteria:
#> (combined values)
#> AIC AICc BIC BICc
#> 962.7227 967.4430 1008.8018 1020.3408
#>
#> Forecast errors:
#> MPE: -1%; sCE: -14.7%; Asymmetry: -8.2%; MAPE: 2.7%
#> MASE: 0.504; sMAE: 4.6%; sMSE: 0.5%; rMAE: 0.16; rRMSE: 0.174
As mentioned above, ADAM does not only contain ETS, it also contains ARIMA model, which is regulated via orders
parameter. If you want to have a pure ARIMA, you need to switch off ETS, which is done via model="NNN"
:
<- adam(BJsales, "NNN", silent=FALSE, orders=c(0,2,2),
testModel h=12, holdout=TRUE)
testModel#> Time elapsed: 0.05 seconds
#> Model estimated using adam() function: ARIMA(0,2,2)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 240.5944
#> ARMA parameters of the model:
#> MA:
#> theta1[1] theta2[1]
#> -0.7484 -0.0165
#>
#> Sample size: 138
#> Number of estimated parameters: 5
#> Number of degrees of freedom: 133
#> Information criteria:
#> AIC AICc BIC BICc
#> 491.1888 491.6434 505.8251 506.9449
#>
#> Forecast errors:
#> ME: 2.961; MAE: 3.087; RMSE: 3.812
#> sCE: 15.63%; Asymmetry: 90.2%; sMAE: 1.358%; sMSE: 0.028%
#> MASE: 2.591; RMSSE: 2.485; rMAE: 0.996; rRMSE: 0.995
Given that both models are implemented in the same framework, they can be compared using information criteria.
The functionality of ADAM ARIMA is similar to the one of msarima
function in smooth
package, although there are several differences.
First, changing the distribution
parameter will allow switching between additive / multiplicative models. For example, distribution="dlnorm"
will create an ARIMA, equivalent to the one on logarithms of the data:
<- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
testModel orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dlnorm",
h=12, holdout=TRUE)
testModel#> Time elapsed: 0.77 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12]
#> Distribution assumed in the model: Log-Normal
#> Loss function type: likelihood; Loss function value: 497.8476
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi1[12]
#> -0.2329 -0.0467
#> MA:
#> theta1[1] theta2[1] theta1[12] theta2[12]
#> 0.2091 0.1830 -0.3077 -0.0148
#>
#> Sample size: 132
#> Number of estimated parameters: 33
#> Number of degrees of freedom: 99
#> Information criteria:
#> AIC AICc BIC BICc
#> 1061.695 1084.593 1156.828 1212.731
#>
#> Forecast errors:
#> ME: -24.727; MAE: 24.727; RMSE: 28.993
#> sCE: -113.041%; Asymmetry: -100%; sMAE: 9.42%; sMSE: 1.22%
#> MASE: 1.027; RMSSE: 0.925; rMAE: 0.325; rRMSE: 0.282
Second, if you want the model with intercept / drift, you can do it using constant
parameter:
<- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12), constant=TRUE,
testModel orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm",
h=12, holdout=TRUE)
testModel#> Time elapsed: 0.53 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12] with drift
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 501.6243
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi1[12]
#> 0.6168 -0.0477
#> MA:
#> theta1[1] theta2[1] theta1[12] theta2[12]
#> -0.6855 0.2281 -0.0213 0.1886
#>
#> Sample size: 132
#> Number of estimated parameters: 34
#> Number of degrees of freedom: 98
#> Information criteria:
#> AIC AICc BIC BICc
#> 1071.249 1095.785 1169.264 1229.166
#>
#> Forecast errors:
#> ME: -47.026; MAE: 47.026; RMSE: 53.945
#> sCE: -214.983%; Asymmetry: -100%; sMAE: 17.915%; sMSE: 4.224%
#> MASE: 1.953; RMSSE: 1.722; rMAE: 0.619; rRMSE: 0.524
If the model contains non-zero differences, then the constant acts as a drift. Third, you can specify parameters of ARIMA via the arma
parameter in the following manner:
<- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
testModel orders=list(ar=c(1,1),i=c(1,1),ma=c(2,2)), distribution="dnorm",
arma=list(ar=c(0.1,0.1), ma=c(-0.96, 0.03, -0.12, 0.03)),
h=12, holdout=TRUE)
testModel#> Time elapsed: 0.27 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,2)[12]
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 534.9627
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi1[12]
#> 0.1 0.1
#> MA:
#> theta1[1] theta2[1] theta1[12] theta2[12]
#> -0.96 0.03 -0.12 0.03
#>
#> Sample size: 132
#> Number of estimated parameters: 27
#> Number of degrees of freedom: 105
#> Information criteria:
#> AIC AICc BIC BICc
#> 1123.925 1138.464 1201.761 1237.255
#>
#> Forecast errors:
#> ME: 9.575; MAE: 17.082; RMSE: 19.148
#> sCE: 43.773%; Asymmetry: 61.9%; sMAE: 6.508%; sMSE: 0.532%
#> MASE: 0.709; RMSSE: 0.611; rMAE: 0.225; rRMSE: 0.186
Finally, the initials for the states can also be provided, although getting the correct ones might be a challenging task (you also need to know how many of them to provide; checking testModel$initial
might help):
<- adam(AirPassengers, "NNN", silent=FALSE, lags=c(1,12),
testModel orders=list(ar=c(1,1),i=c(1,1),ma=c(2,0)), distribution="dnorm",
initial=list(arima=AirPassengers[1:24]),
h=12, holdout=TRUE)
testModel#> Time elapsed: 0.58 seconds
#> Model estimated using adam() function: SARIMA(1,1,2)[1](1,1,0)[12]
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 489.1803
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi1[12]
#> -0.4281 -0.0469
#> MA:
#> theta1[1] theta2[1]
#> 0.2213 0.0414
#>
#> Sample size: 132
#> Number of estimated parameters: 31
#> Number of degrees of freedom: 101
#> Information criteria:
#> AIC AICc BIC BICc
#> 1040.361 1060.201 1129.727 1178.165
#>
#> Forecast errors:
#> ME: -16.429; MAE: 18.213; RMSE: 23.439
#> sCE: -75.107%; Asymmetry: -88.2%; sMAE: 6.938%; sMSE: 0.797%
#> MASE: 0.756; RMSSE: 0.748; rMAE: 0.24; rRMSE: 0.228
If you work with ADAM ARIMA model, then there is no such thing as “usual” bounds for the parameters, so the function will use the bounds="admissible"
, checking the AR / MA polynomials in order to make sure that the model is stationary and invertible (aka stable).
Similarly to ETS, you can use different distributions and losses for the estimation. Note that the order selection for ARIMA is done in auto.adam()
function, not in the adam()
! However, if you do orders=list(..., select=TRUE)
in adam()
, it will call auto.adam()
and do the selection.
Finally, ARIMA is typically slower than ETS, mainly because its initial states are more difficult to estimate due to an increased complexity of the model. If you want to speed things up, use initial="backcasting"
and reduce the number of iterations via maxeval
parameter.
Another important feature of ADAM is introduction of explanatory variables. Unlike in es()
, adam()
expects a matrix for data
and can work with a formula. If the latter is not provided, then it will use all explanatory variables. Here is a brief example:
<- cbind(BJsales,BJsales.lead)
BJData <- adam(BJData, "AAN", h=18, silent=FALSE) testModel
If you work with data.frame or similar structures, then you can use them directly, ADAM will extract the response variable either assuming that it is in the first column or from the provided formula (if you specify one via formula
parameter). Here is an example, where we create a matrix with lags and leads of an explanatory variable:
<- cbind(as.data.frame(BJsales),as.data.frame(xregExpander(BJsales.lead,c(-7:7))))
BJData colnames(BJData)[1] <- "y"
<- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, formula=y~xLag1+xLag2+xLag3)
testModel
testModel#> Time elapsed: 0.13 seconds
#> Model estimated using adam() function: ETSX(ANN)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 210.9197
#> Persistence vector g (excluding xreg):
#> alpha
#> 1
#>
#> Sample size: 132
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 126
#> Information criteria:
#> AIC AICc BIC BICc
#> 433.8393 434.5113 451.1361 452.7768
#>
#> Forecast errors:
#> ME: 0.744; MAE: 1.299; RMSE: 1.782
#> sCE: 5.924%; Asymmetry: 44.2%; sMAE: 0.575%; sMSE: 0.006%
#> MASE: 1.065; RMSSE: 1.141; rMAE: 0.58; rRMSE: 0.71
Similarly to es()
, there is a support for variables selection, but via the regressors
parameter instead of xregDo
, which will then use stepwise()
function from greybox
package on the residuals of the model:
<- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, regressors="select") testModel
The same functionality is supported with ARIMA, so you can have, for example, ARIMAX(0,1,1), which is equivalent to ETSX(A,N,N):
<- adam(BJData, "NNN", h=18, silent=FALSE, holdout=TRUE, regressors="select", orders=c(0,1,1)) testModel
The two models might differ because they have different initialisation in the optimiser and different bounds for parameters (ARIMA relies on invertibility condition, while ETS does the traditional (0,1) bounds by default). It is possible to make them identical if the number of iterations is increased and the initial parameters are the same. Here is an example of what happens, when the two models have exactly the same parameters:
<- BJData[,c("y",names(testModel$initial$xreg))];
BJData <- adam(BJData, "NNN", h=18, silent=TRUE, holdout=TRUE, orders=c(0,1,1),
testModel initial=testModel$initial, arma=testModel$arma)
testModel#> Time elapsed: 0.01 seconds
#> Model estimated using adam() function: ARIMAX(0,1,1)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 80.6624
#> ARMA parameters of the model:
#> MA:
#> theta1[1]
#> 0.2379
#>
#> Sample size: 132
#> Number of estimated parameters: 1
#> Number of degrees of freedom: 131
#> Information criteria:
#> AIC AICc BIC BICc
#> 163.3248 163.3555 166.2076 166.2827
#>
#> Forecast errors:
#> ME: 0.457; MAE: 0.571; RMSE: 0.687
#> sCE: 3.638%; Asymmetry: 82.1%; sMAE: 0.253%; sMSE: 0.001%
#> MASE: 0.468; RMSSE: 0.44; rMAE: 0.255; rRMSE: 0.274
names(testModel$initial)[1] <- names(testModel$initial)[[1]] <- "level"
<- adam(BJData, "ANN", h=18, silent=TRUE, holdout=TRUE,
testModel2 initial=testModel$initial, persistence=testModel$arma$ma+1)
testModel2#> Time elapsed: 0 seconds
#> Model estimated using adam() function: ETSX(ANN)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 1e+300
#> Persistence vector g (excluding xreg):
#> alpha
#> 1.2379
#>
#> Sample size: 132
#> Number of estimated parameters: 1
#> Number of degrees of freedom: 131
#> Information criteria:
#> AIC AICc BIC BICc
#> 163.3248 163.3555 166.2076 166.2827
#>
#> Forecast errors:
#> ME: 0.457; MAE: 0.571; RMSE: 0.687
#> sCE: 3.638%; Asymmetry: 82.1%; sMAE: 0.253%; sMSE: 0.001%
#> MASE: 0.468; RMSSE: 0.44; rMAE: 0.255; rRMSE: 0.274
Another feature of ADAM is the time varying parameters in the SSOE framework, which can be switched on via regressors="adapt"
:
<- adam(BJData, "ANN", h=18, silent=FALSE, holdout=TRUE, regressors="adapt")
testModel $persistence
testModel#> alpha delta1 delta2 delta3 delta4 delta5
#> 2.818516e-01 7.651576e-01 4.039971e-01 2.623775e-03 2.553495e-04 8.189351e-05
Note that the default number of iterations might not be sufficient in order to get close to the optimum of the function, so setting maxeval
to something bigger might help. If you want to explore, why the optimisation stopped, you can provide print_level=41
parameter to the function, and it will print out the report from the optimiser. In the end, the default parameters are tuned in order to give a reasonable solution, but given the complexity of the model, they might not guarantee to give the best one all the time.
Finally, you can produce a mixture of ETS, ARIMA and regression, by using the respective parameters, like this:
<- adam(BJData, "AAN", h=18, silent=FALSE, holdout=TRUE, orders=c(1,0,1))
testModel summary(testModel)
#>
#> Model estimated using adam() function: ETSX(AAN)+ARIMA(1,0,1)
#> Response variable: y
#> Distribution used in the estimation: Normal
#> Loss function type: likelihood; Loss function value: 62.5577
#> Coefficients:
#> Estimate Std. Error Lower 2.5% Upper 97.5%
#> alpha 0.6034 0.9647 0.0000 1.0000
#> beta 0.0000 0.0248 0.0000 0.0491
#> phi1[1] 0.9564 0.0097 0.9371 0.9756 *
#> theta1[1] -0.3307 0.9157 -0.9664 1.4792
#> level 52.3515 6.0194 40.4324 64.2487 *
#> trend 0.1009 0.0365 0.0286 0.1731 *
#> ARIMAState1 2.4862 2.9451 -3.3455 8.3071
#> xLag3 4.9598 0.1254 4.7115 5.2078 *
#> xLag7 0.8199 0.1304 0.5617 1.0777 *
#> xLag4 3.8937 0.1679 3.5612 4.2257 *
#> xLag6 1.9676 0.1704 1.6302 2.3045 *
#> xLag5 2.7998 0.1799 2.4436 3.1553 *
#>
#> Error standard deviation: 0.4094
#> Sample size: 132
#> Number of estimated parameters: 13
#> Number of degrees of freedom: 119
#> Information criteria:
#> AIC AICc BIC BICc
#> 151.1155 154.2002 188.5919 196.1230
This might be handy, when you explore a high frequency data, want to add calendar events, apply ETS and add AR/MA errors to it.
While the original adam()
function allows selecting ETS components and explanatory variables, it does not allow selecting the most suitable distribution and / or ARIMA components. This is what auto.adam()
function is for.
In order to do the selection of the most appropriate distribution, you need to provide a vector of those that you want to check:
<- auto.adam(BJsales, "XXX", silent=FALSE,
testModel distribution=c("dnorm","dlaplace","ds"),
h=12, holdout=TRUE)
#> Evaluating models with different distributions... dnorm , dlaplace , ds , Done!
testModel#> Time elapsed: 0.19 seconds
#> Model estimated using auto.adam() function: ETS(AAdN)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 238.2715
#> Persistence vector g:
#> alpha beta
#> 0.9534 0.2925
#> Damping parameter: 0.8622
#> Sample size: 138
#> Number of estimated parameters: 6
#> Number of degrees of freedom: 132
#> Information criteria:
#> AIC AICc BIC BICc
#> 488.5431 489.1843 506.1066 507.6863
#>
#> Forecast errors:
#> ME: 2.814; MAE: 2.969; RMSE: 3.655
#> sCE: 14.854%; Asymmetry: 87.8%; sMAE: 1.306%; sMSE: 0.026%
#> MASE: 2.492; RMSSE: 2.382; rMAE: 0.958; rRMSE: 0.954
This process can also be done in parallel on either the automatically selected number of cores (e.g. parallel=TRUE
) or on the specified by user (e.g. parallel=4
):
<- auto.adam(BJsales, "ZZZ", silent=FALSE, parallel=TRUE,
testModel h=12, holdout=TRUE)
If you want to add ARIMA or regression components, you can do it in the exactly the same way as for the adam()
function. Here is an example of ETS+ARIMA:
<- auto.adam(BJsales, "AAN", orders=list(ar=2,i=2,ma=2), silent=TRUE,
testModel distribution=c("dnorm","dlaplace","ds","dgnorm"),
h=12, holdout=TRUE)
testModel#> Time elapsed: 0.45 seconds
#> Model estimated using auto.adam() function: ETS(AAN)+ARIMA(2,2,2)
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 242.0733
#> Persistence vector g:
#> alpha beta
#> 0.0728 0.0001
#>
#> ARMA parameters of the model:
#> AR:
#> phi1[1] phi2[1]
#> -0.5033 -0.2070
#> MA:
#> theta1[1] theta2[1]
#> -0.3293 -0.1526
#>
#> Sample size: 138
#> Number of estimated parameters: 13
#> Number of degrees of freedom: 125
#> Information criteria:
#> AIC AICc BIC BICc
#> 510.1466 513.0821 548.2009 555.4328
#>
#> Forecast errors:
#> ME: 2.774; MAE: 2.936; RMSE: 3.6
#> sCE: 14.644%; Asymmetry: 87.6%; sMAE: 1.292%; sMSE: 0.025%
#> MASE: 2.465; RMSSE: 2.347; rMAE: 0.947; rRMSE: 0.94
However, this way the function will just use ARIMA(2,2,2) and fit it together with ETS. If you want it to select the most appropriate ARIMA orders from the provided (e.g. up to AR(2), I(1) and MA(2)), you need to add parameter select=TRUE
to the list in orders
:
<- auto.adam(BJsales, "XXN", orders=list(ar=2,i=2,ma=2,select=TRUE),
testModel distribution="default", silent=FALSE,
h=12, holdout=TRUE)
#> Evaluating models with different distributions... default , Selecting ARIMA orders...
#> Selecting differences...
#> Selecting ARMA... |
#> The best ARIMA is selected. Done!
testModel#> Time elapsed: 0.17 seconds
#> Model estimated using auto.adam() function: ETS(AAdN) with drift
#> Distribution assumed in the model: Normal
#> Loss function type: likelihood; Loss function value: 237.1294
#> Persistence vector g:
#> alpha beta
#> 0.9541 0.2851
#> Damping parameter: 0.8363
#> Sample size: 138
#> Number of estimated parameters: 7
#> Number of degrees of freedom: 131
#> Information criteria:
#> AIC AICc BIC BICc
#> 488.2588 489.1204 508.7496 510.8721
#>
#> Forecast errors:
#> ME: 0.622; MAE: 1.2; RMSE: 1.5
#> sCE: 3.286%; Asymmetry: 48.9%; sMAE: 0.528%; sMSE: 0.004%
#> MASE: 1.007; RMSSE: 0.978; rMAE: 0.387; rRMSE: 0.391
Knowing how to work with adam()
, you can use similar principles, when dealing with auto.adam()
. Just keep in mind that the provided persistence
, phi
, initial
, arma
and B
won’t work, because this contradicts the idea of the model selection.
Finally, there is also the mechanism of automatic outliers detection, which extracts residuals from the best model, flags observations that lie outside the prediction interval of thw width level
in sample and then refits auto.adam()
with the dummy variables for the outliers. Here how it works:
<- auto.adam(AirPassengers, "PPP", silent=FALSE, outliers="use",
testModel distribution="default",
h=12, holdout=TRUE)
#> Evaluating models with different distributions... default ,
#> Dealing with outliers...
testModel#> Time elapsed: 1.86 seconds
#> Model estimated using auto.adam() function: ETSX(MMM)
#> Distribution assumed in the model: Gamma
#> Loss function type: likelihood; Loss function value: 464.5879
#> Persistence vector g (excluding xreg):
#> alpha beta gamma
#> 0.7633 0.0000 0.0000
#>
#> Sample size: 132
#> Number of estimated parameters: 19
#> Number of degrees of freedom: 113
#> Information criteria:
#> AIC AICc BIC BICc
#> 967.1758 973.9615 1021.9491 1038.5157
#>
#> Forecast errors:
#> ME: -8.461; MAE: 15.346; RMSE: 22.431
#> sCE: -38.678%; Asymmetry: -49%; sMAE: 5.846%; sMSE: 0.73%
#> MASE: 0.637; RMSSE: 0.716; rMAE: 0.202; rRMSE: 0.218
If you specify outliers="select"
, the function will create leads and lags 1 of the outliers and then select the most appropriate ones via the regressors
parameter of adam.
If you want to know more about ADAM, you are welcome to visit the online textbook (this is a work in progress at the moment).