Have a look at heavy-tail Lambert W x F or skewed Lambert W x F distributions for context.
Related posts:
One advantage over Cauchy or student-t distribution with fixed degrees of freedom is that the tail parameters can be estimated from the data – so you can let the data decide what moments exist. Moreover the Lambert W x F framework allows you to transform your data and remove skewness / heavy-tails. Itt is important to note though that OLS does not require Normality of \(y\) or \(X\). However, for your EDA it might be worthwhile.
Here is an example of Lambert W x Gaussian estimates applied to stock market data.
ret <- diff(log(EuStockMarkets)) * 100
plot(ret)
The summary metrics of the returns are similar (not as extreme) as in OP's post.
library(LambertW) # this will load the `moments` package as well
## Loading required package: MASS
## Warning: package 'MASS' was built under R version 4.1.2
## Loading required package: ggplot2
## This is 'LambertW' version 0.6.7. See the NEWS file and citation("LambertW").
data_metrics <- function(x) {
c(mean = mean(x), sd = sd(x), min = min(x), max = max(x),
skewness = skewness(x), kurtosis = kurtosis(x))
}
ret.metrics <- t(apply(ret, 2, data_metrics))
ret.metrics
## mean sd min max skewness kurtosis
## DAX 0.0652 1.030 -9.63 5.08 -0.554 9.28
## SMI 0.0818 0.925 -8.38 4.97 -0.632 8.74
## CAC 0.0437 1.103 -7.58 6.10 -0.177 5.39
## FTSE 0.0432 0.796 -4.14 5.44 0.110 5.64
Most series show clearly non-Normal characteristics (strong skewness and/or large kurtosis). Let's Gaussianize each series using a heavy tailed
Lambert W x Gaussian distribution (= Tukey's h) using a methods of moments
estimator (IGMM
).
library(LambertW)
ret.gauss <- Gaussianize(ret, type = "h", method = "IGMM")
colnames(ret.gauss) <- gsub("\\.X", "", colnames(ret.gauss))
plot(ret.gauss)
The time series plots show much fewer tails and also more stable variation over time (not constant though). Computing the metrics again on the Gaussianized time series yields:
ret.gauss.metrics <- t(apply(ret.gauss, 2, data_metrics))
ret.gauss.metrics
## mean sd min max skewness kurtosis
## DAX 0.0718 0.797 -3.07 2.55 -0.07109 3
## SMI 0.0921 0.720 -2.74 2.42 -0.14035 3
## CAC 0.0463 0.948 -3.70 3.43 -0.04319 3
## FTSE 0.0431 0.681 -2.33 2.70 -0.00439 3
The IGMM
algorithm achieved exactly what it was set forth to do: transform the data to have kurtosis equal to \(3\). Interestingly, all time series now have negative skewness, which is in line with most financial time series literature. Important to point out here that Gaussianize()
operates only marginally, not jointly (analogously to scale()
).
To consider the effect of Gaussianization on OLS consider predicting “FTSE” return from “DAX” returns and vice versa.
layout(matrix(1:2, ncol = 2, byrow = TRUE))
plot(ret[, "FTSE"], ret[, "DAX"])
grid()
plot(ret.gauss[, "DAX"], ret.gauss[, "FTSE"])
grid()
The left scatterplot of the original series shows that the strong outliers did not occur at the same days, but at different times in India and Europe; other than that it is not clear if the data cloud in the center supports no correlation or negative/positive dependency. Since outliers strongly affect variance and correlation estimates, it is worthwhile to look at the dependency with heavy tails removed (right scatterplot). Here the patterns are much more clear and the positive relation between India and Eastern Europe market becomes apparent.
# try these models on your own
mod <- lm(FTSE ~ DAX + SMI + CAC, data = ret)
mod.robust <- rlm(FTSE ~ DAX + SMI + CAC, data = ret)
mod.gauss <- lm(FTSE ~ DAX + SMI + CAC, data = ret.gauss)
summary(mod)
summary(mod.robust)
summary(mod.gauss)
A Granger causality test based on a \(VAR(5)\) model (I use \(p = 5\) to capture the week effect of daily trades) for “DAX” and “CAX” rejects “no Granger causality” for DAX –> CAC direction.
library(vars)
## Loading required package: strucchange
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: urca
## Loading required package: lmtest
mod.vars <- vars::VAR(ret[, c("DAX", "CAC")], p = 6)
causality(mod.vars, "DAX")$Granger
##
## Granger causality H0: DAX do not Granger-cause CAC
##
## data: VAR object mod.vars
## F-Test = 2, df1 = 6, df2 = 3680, p-value = 0.09
causality(mod.vars, "CAC")$Granger
##
## Granger causality H0: CAC do not Granger-cause DAX
##
## data: VAR object mod.vars
## F-Test = 0.9, df1 = 6, df2 = 3680, p-value = 0.5
However, for the Gaussianized data the answer is different! Here the test can not reject H0 that “INDIA does not Granger-cause EASTEU”, but still rejects that “EASTEU does not Granger-cause INDIA”. So the Gaussianized data supports the hypothesis that European markets drive markets in India the following day.
mod.vars.gauss <- vars::VAR(ret.gauss[, c("DAX", "CAC")], p = 6)
causality(mod.vars.gauss, "DAX")$Granger
##
## Granger causality H0: DAX do not Granger-cause CAC
##
## data: VAR object mod.vars.gauss
## F-Test = 0.9, df1 = 6, df2 = 3680, p-value = 0.5
causality(mod.vars.gauss, "CAC")$Granger
##
## Granger causality H0: CAC do not Granger-cause DAX
##
## data: VAR object mod.vars.gauss
## F-Test = 0.9, df1 = 6, df2 = 3680, p-value = 0.5
It is not clear to me which one is the right answer (if any), but it's an interesting observation to make. Needless to say that this entire Causality testing is contingent on the \(VAR(6)\) being the correct model – which it is most likely not; but I think it serves well for illustratiton.