For an open access tutorial paper explaining how to set equivalence bounds, and how to perform and report equivalence for ANOVA models see Campbell and Lakens (2021). These functions are meant to be omnibus tests, and additional testing may be necessary. For example, comparison of the estimated marginal means, in addition to or as an alternative of with may be prudent.
Statistical equivalence testing (or “omnibus non-inferiority testing” as Campbell and Lakens (2021)) for F-tests are special use case of the cumulative distribution function of the non-central F distribution.
As Campbell and Lakens (2021) state, these type of questions answer the question: “Can we reject the hypothesis that the total proportion of variance in outcome Y attributable to X is greater than or equal to the equivalence bound \(\Delta\)?”
\[ H_0: \space 1 > \eta^2_p \geq \Delta \\ H_1: \space 0 \geq \eta^2_p < \Delta \]
In TOSTER
we go a tad farther and calculate a more generalizable non-centrality parameter than allows the equivalence test for F-tests to be applied to variety of designs.
Campbell and Lakens (2021) calculate the p-value as:
\[ p = p_f(F; J-1, N-J, \frac{N \cdot \Delta}{1-\Delta}) \]
However, this approach could not be applied to factorial ANOVA and the paper only outlines how to apply this approach to a one-way ANOVA and an extension to Welch’s one-way ANOVA.
However, the non-centrality parameter (ncp = \(\lambda\)) can be calculated with the equivalence bound and the degrees of freedom:
\[ \lambda_{eq} = \frac{\Delta}{1-\Delta} \cdot(df_1 + df_2 +1) \]
The p-value for the equivalence test (\(p_{eq}\)) could then be calculated from traditional ANOVA results and the distribution function:
\[ p_{eq} = p_f(F; df_1, df_2, \lambda_{eq}) \]
Using the InsectSprays
data set in R and the base R aov
function we can demonstrate how this omnibus equivalence testing can be applied with TOSTER
.
From the initial analysis we an see a clear “significant” effect (the p-value listed is zero but it just very small) of the factor spray. However, we may be interested in testing if the effect is practically equivalent. I will arbitrarily set the equivalence bound to a partial eta-squared of 0.35 (\(H_0: \eta^2_p > 0.35\))
library(TOSTER)
# Get Data
data("InsectSprays")
# Build ANOVA
= aov(count ~ spray,
aovtest data = InsectSprays)
# Display overall results
::kable(broom::tidy(aovtest),
knitrcaption = "Traditional ANOVA Test")
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
spray | 5 | 2668.833 | 533.76667 | 34.70228 | 0 |
Residuals | 66 | 1015.167 | 15.38131 | NA | NA |
We can then use the information in the table above to perform an equivalence test using the equ_ftest
function. This function returns an object of the S3 class htest
and the output will look very familiar to the the t-test. The main difference is the estimates, and confidence interval, are for partial \(\eta^2_p\).
equ_ftest(Fstat = 34.70228,
df1 = 5,
df2 = 66,
eqbound = 0.35)
##
## Equivalence Test from F-test
##
## data: Summary Statistics
## F = 34.702, df1 = 5, df2 = 66, p-value = 1
## 95 percent confidence interval:
## 0.5806263 0.7804439
## sample estimates:
## [1] 0.724439
Based on the results above we would conclude there is a significant effect of “spray” and the differences due to spray are not statistically equivalent. In essence, we reject the traditional null hypothesis of “no effect” but accept the null hypothesis of the equivalence test.
The equ_ftest
is very useful because all you need is very basic summary statistics. However, if you are doing all your analyses in R then you can use the equ_anova
function. This function accepts objects produced from stats::aov
, car::Anova
and afex::aov_car
(or any anova from afex
).
equ_anova(aovtest,
eqbound = 0.35)
## effect df1 df2 F.value p.null pes eqbound p.equ
## 1 spray 5 66 34.70228 3.182584e-17 0.724439 0.35 0.9999965
Just like the standardized mean differences, TOSTER
also has a function to visualize \(\eta^2_p\).
The function, plot_pes
, operates in a fashion very similar to equ_ftest
. In essence, all you have to do is provide the F-statistic, numerator degrees of freedom, and denominator degrees of freedom. We can also select the type of plot with the type
argument. Users have the option of producing a consonance plot (type = "c"
), a consonance density plot (type = "cd"
), or both (type = c("cd","c")
). By default, plot_pes
will produce both plots.
plot_pes(Fstat = 34.70228,
df1 = 5,
df2 = 66)
Power for an equivalence F-test can be calculated with the same equations supplied by Campbell and Lakens (2021). I have included these within the power_eq_f
function.
power_eq_f(df1 = 2,
df2 = 60,
eqbound = .15)
## Note: equ_anova only validated for one-way ANOVA; use with caution
##
## Power for Non-Inferiority F-test
##
## df1 = 2
## df2 = 60
## eqbound = 0.15
## sig.level = 0.05
## power = 0.8188512