library("papaja")
#> Lade nötiges Paket: tinylabels
Here, we provide a brief overview of issues to consider when
implementing a new method for apa_print()
, a convenience
function to facilitate reporting of results in accordance with APA
reporting guidelines. If you consider adding a new method, please review
our brief contributing
guidelines and code
of conduct.
If you are reporting the results of a statistical analysis that is
not yet supported by apa_print()
you probably have a good
motivation and possibly prior work to build on. If you are just looking
for a way to contribute to, take a look at the open issues for
inspiration.
apa_print()
is a generic, meaning it can, in principle,
work on any output object with a class that is specific enough to
purposefully extract the results of the analysis. For example, objects
of class htest
, as returned by t.test()
,
cor.test()
, prop.test()
, etc., are named lists
that follow a loose convention about the named objects they contain.
<- t.test(extra ~ group, data = sleep)
t_test_example
class(t_test_example)
#> [1] "htest"
str(t_test_example)
#> List of 10
#> $ statistic : Named num -1.86
#> ..- attr(*, "names")= chr "t"
#> $ parameter : Named num 17.8
#> ..- attr(*, "names")= chr "df"
#> $ p.value : num 0.0794
#> $ conf.int : num [1:2] -3.365 0.205
#> ..- attr(*, "conf.level")= num 0.95
#> $ estimate : Named num [1:2] 0.75 2.33
#> ..- attr(*, "names")= chr [1:2] "mean in group 1" "mean in group 2"
#> $ null.value : Named num 0
#> ..- attr(*, "names")= chr "difference in means between group 1 and group 2"
#> $ stderr : num 0.849
#> $ alternative: chr "two.sided"
#> $ method : chr "Welch Two Sample t-test"
#> $ data.name : chr "extra by group"
#> - attr(*, "class")= chr "htest"
Hence, if we pass an htest
object to
apa_print()
the function expects there to be named elements
in the list, such as statistic
, estimate
, or
p.value
. These expectations are reflected in the workings
of the apa_print.htest()
method. Objects of less specific
classes, such as list
or data.frame
cannot be
supported, because we cannot make any useful assumptions about their
structure.
Objects returned by apa_print()
are of class
apa_results
, a named list with four elements:
#> $estimate
#> NULL
#>
#> $statistic
#> NULL
#>
#> $full_result
#> NULL
#>
#> $table
#> NULL
#>
#> attr(,"class")
#> [1] "apa_results" "list"
To illustrate how apa_results
objects are populated,
let’s look at the output of apa_print.lm()
.
# Data from Dobson (1990), p. 9.
<- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
ctl <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
trt <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
group <- c(ctl, trt)
weight <- lm(weight ~ group)
lm_fit
<- apa_print(lm_fit) lm_fit_apa
The estimate
element of the returned
apa_results
-list itself contains a named list with
estimated parameters—in this case regression coefficients—and
corresponding confidence intervals for the model. The names of the list
correspond to the names of the predictors.
$estimate
lm_fit_apa#> $Intercept
#> [1] "$b = 5.03$, 95\\% CI $[4.57, 5.49]$"
#>
#> $groupTrt
#> [1] "$b = -0.37$, 95\\% CI $[-1.03, 0.28]$"
#>
#> $modelfit
#> $modelfit$r2
#> [1] "$R^2 = .07$, 90\\% CI $[0.00, 0.33]$"
#>
#> $modelfit$r2_adj
#> [1] "$R^2_{adj} = .02$"
#>
#> $modelfit$aic
#> [1] "$\\mathrm{AIC} = 46.18$"
#>
#> $modelfit$bic
#> [1] "$\\mathrm{BIC} = 49.16$"
The estimate
list may contain additional elements, such
as the list modelfit
, that contains quantitative estimates
of the model fit.
The statistic
element of the returned
apa_results
list contains a named list with the same
structure as estimate
. Instead of parameter estimates,
statistic
contains the corresponding inferential test
statistics, such as significance tests or Bayesian model
comparisons.
$statistic
lm_fit_apa#> $Intercept
#> [1] "$t(18) = 22.85$, $p < .001$"
#>
#> $groupTrt
#> [1] "$t(18) = -1.19$, $p = .249$"
#>
#> $modelfit
#> $modelfit$r2
#> [1] "$F(1, 18) = 1.42$, $p = .249$"
Note that the statistics
list misses elements for the
information criteria AIC
and BIC
. Because no
inferential test statistics on the information criteria are available,
it is fine to simply drop those elements.
The full_results
element is a named list that simply
combines the results of estimate
and statistic
for convenience in reporting.
$full_result
lm_fit_apa#> $Intercept
#> [1] "$b = 5.03$, 95\\% CI $[4.57, 5.49]$, $t(18) = 22.85$, $p < .001$"
#>
#> $groupTrt
#> [1] "$b = -0.37$, 95\\% CI $[-1.03, 0.28]$, $t(18) = -1.19$, $p = .249$"
#>
#> $modelfit
#> $modelfit$r2
#> [1] "$R^2 = .07$, 90\\% CI $[0.00, 0.33]$, $F(1, 18) = 1.42$, $p = .249$"
Finally, the table
element contains a
data.frame
of class apa_results_table
that
summarizes the results. In essence this is simply a regular
data.frame
that follows the column-naming
conventions used in broom
but allows for prettier printing of variable labels.
$table
lm_fit_apa#> A data.frame with 6 labelled columns:
#>
#> term estimate conf.int statistic df p.value
#> 1 Intercept 5.03 [4.57, 5.49] 22.85 18 < .001
#> 2 GroupTrt -0.37 [-1.03, 0.28] -1.19 18 .249
#>
#> term : Predictor
#> estimate : $b$
#> conf.int : 95\\% CI
#> statistic: $t$
#> df : $\\mathit{df}$
#> p.value : $p$
For more complex analyses table
may contain a named list
of apa_result_table
s. We use tinylabels to
set variable labels. These variable labels are attributes attached to
each column and contain a typeset label for the respective column.
# library("tinylabels")
letters#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
#> [20] "t" "u" "v" "w" "x" "y" "z"
variable_label(letters) <- "Letters of the alphabet"
variable_label(letters)
#> [1] "Letters of the alphabet"
letters#> Variable label : Letters of the alphabet
#> [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
#> [20] "t" "u" "v" "w" "x" "y" "z"
str(letters)
#> 'tiny_labelled' chr [1:26] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" ...
#> - attr(*, "label")= chr "Letters of the alphabet"
$table$statistic
lm_fit_apa#> Variable label : $t$
#> [1] "22.85" "-1.19"
Variable labels are automatically used by apa_table()
and plotting functions from the apa_factorial_plot()
-family
to create sensible default labels. If a label is enveloped in
$
it may contain LaTeX math syntax, which is automatically
converted to R expressions using latex2exp
for plotting.
Any new apa_print()
method should output an object of
this basic structure.
apa_results
do not contain numeric information. Rather
the numeric information has been processed for printing in accordance
with APA guidelines. There are several papaja-functions
to facilitate the typesetting. apa_num()
is a flexible
general purpose function that wraps formatC()
and can be
used to round, set decimal as well as thousands separators, or remove
leading zeros.
<- rnorm(3) * 1e4
x apa_num(x)
#> [1] "-17,402.11" "-3,444.68" "-14,935.10"
apa_num(x, digits = 3, big.mark = ".", decimal.mark = ",")
#> [1] "-17.402,114" "-3.444,676" "-14.935,103"
apa_num(Inf)
#> [1] "$\\infty$"
apa_p()
is a wrapper for apa_num()
that
sets appropriate defaults to report p values in accordance with
APA guidelines.
apa_p(c(0.0001, 0.05, 0.99999))
#> [1] "< .001" ".050" "> .999"
The internal function apa_df()
is geared towards
typesetting degrees of freedom.
apa_df(c(12, 12.485))
#> [1] "12" "12.48"
apa_df(12L)
#> [1] "12"
Finally, apa_interval()
can be used to typeset interval
estimates.
apa_interval(rnorm(2), conf.int = 0.95, interval_type = "CI")
#> [1] "95\\% CI [-0.24, -0.38]"
Again, there are two wrappers that set appropriate defaults to typeset frequentist confidence intervals and Bayesian highest-density intervals.
apa_confint(rnorm(2), conf.int = 0.95)
#> [1] "95\\% CI [-0.63, 0.47]"
apa_hdint(rnorm(2), conf.int = 0.95)
#> [1] "95\\% HDI [0.96, -1.50]"
When creating named lists from terms, these terms names should use
_
as separator, and be valid R names. Adhering to these
conventions ensures that apa_results
can conveniently be
indexed using the $
operator.
To facilitate the generation of list names, papaja
provides the internal function sanitize_terms()
.
<- c("(Intercept)", "Factor A", "Factor B",
mod_terms "Factor A:Factor B", "scale(Factor A)")
sanitize_terms(mod_terms, standardized = TRUE)
#> [1] "Intercept" "Factor_A" "Factor_B"
#> [4] "Factor_A_Factor_B" "z_Factor_A"
While these sanitized terms are well suited to name R objects, they
are not ideal for reporting. To facilitate typesetting term names for
reporting, there is another internal function
beautify_terms()
.
beautify_terms(mod_terms, standardized = TRUE)
#> [1] "Intercept" "Factor A"
#> [3] "Factor B" "Factor A $\\times$ Factor B"
#> [5] "Factor A"
As with lm
objects, it is often the case that the
objects, as returned by the analysis function, may not contain all
information necessary to populate the lists described above. For
example, to obtain inferential statistics it may be necessary to call
summary()
.
<- aov(yield ~ block + N * P * K, npk)
npk_aov
npk_aov#> Call:
#> aov(formula = yield ~ block + N * P * K, data = npk)
#>
#> Terms:
#> block N P K N:P N:K P:K
#> Sum of Squares 343.2950 189.2817 8.4017 95.2017 21.2817 33.1350 0.4817
#> Deg. of Freedom 5 1 1 1 1 1 1
#> Residuals
#> Sum of Squares 185.2867
#> Deg. of Freedom 12
#>
#> Residual standard error: 3.929447
#> 1 out of 13 effects not estimable
#> Estimated effects may be unbalanced
summary(npk_aov)
#> Df Sum Sq Mean Sq F value Pr(>F)
#> block 5 343.3 68.66 4.447 0.01594 *
#> N 1 189.3 189.28 12.259 0.00437 **
#> P 1 8.4 8.40 0.544 0.47490
#> K 1 95.2 95.20 6.166 0.02880 *
#> N:P 1 21.3 21.28 1.378 0.26317
#> N:K 1 33.1 33.14 2.146 0.16865
#> P:K 1 0.5 0.48 0.031 0.86275
#> Residuals 12 185.3 15.44
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This is why there are usually multiple
apa_print()
-methods that are called subsequently to make
the function both flexible and convenient. For convenience,
apa_print.aov()
calls summary()
with its
default arguments and passes the result onto
apa_print.summary.aov()
.
:::apa_print.aov
papaja#> function (x, estimate = getOption("papaja.estimate_anova", "ges"),
#> observed = NULL, intercept = FALSE, mse = TRUE, in_paren = FALSE,
#> ...)
#> {
#> apa_print(summary(x), .x = x, intercept = intercept, estimate = estimate,
#> mse = mse, observed = observed, in_paren = in_paren,
#> ...)
#> }
#> <bytecode: 0x7fc10c899ac0>
#> <environment: namespace:papaja>
This approach also ensures that a variety of object types are supported while minimizing code redundancy.
The internals of apa_print()
heavily rely on broom,
a package to conveniently restructure the output of analysis functions
into tidy data.frame
s. The objects are often processed
using broom::tidy()
, and broom::glance()
if
necessary, before being modified further to create the contents of the
table
element.
Once the results table has been assembled, numeric values have been
typeset, and variable labels have been assigned
glue_apa_results()
can be used to create an
apa_results
object according to the above specifications.
Consider the following example of an lm
-object. First we
tidy()
and glance()
the object to obtain tidy
results. We than typeset all “special” numerical results, that is, all
results that would not be typeset appropriately by applying
apa_num()
with its default settings. Moreover, we combine
the separate columns for lower and upper confidence interval bounds into
one column conf.int
which contains the complete confidence
interval.
<- lm(mpg ~ cyl + wt, mtcars)
lm_fit
# Tidy and typeset output
library("broom")
<- tidy(lm_fit, conf.int = TRUE)
tidy_lm_fit $p.value <- apa_p(tidy_lm_fit$p.value)
tidy_lm_fit$conf.int <- unlist(apa_confint(tidy_lm_fit[, c("conf.low", "conf.high")]))
tidy_lm_fit
str(tidy_lm_fit)
#> tibble [3 × 8] (S3: tbl_df/tbl/data.frame)
#> $ term : chr [1:3] "(Intercept)" "cyl" "wt"
#> $ estimate : num [1:3] 39.69 -1.51 -3.19
#> $ std.error: num [1:3] 1.715 0.415 0.757
#> $ statistic: num [1:3] 23.14 -3.64 -4.22
#> $ p.value : chr [1:3] "< .001" ".001" "< .001"
#> $ conf.low : num [1:3] 36.18 -2.36 -4.74
#> $ conf.high: num [1:3] 43.19 -0.66 -1.64
#> $ conf.int : chr [1:3] "[36.18, 43.19]" "[-2.36, -0.66]" "[-4.74, -1.64]"
<- glance(lm_fit)
glance_lm_fit $r.squared <- apa_num(glance_lm_fit$r.squared, gt1 = FALSE)
glance_lm_fit$p.value <- apa_p(glance_lm_fit$p.value)
glance_lm_fit$df <- apa_df(glance_lm_fit$df)
glance_lm_fit$df.residual <- apa_df(glance_lm_fit$df.residual)
glance_lm_fit
str(glance_lm_fit)
#> tibble [1 × 12] (S3: tbl_df/tbl/data.frame)
#> $ r.squared : chr ".83"
#> $ adj.r.squared: num 0.819
#> $ sigma : num 2.57
#> $ statistic : Named num 70.9
#> ..- attr(*, "names")= chr "value"
#> $ p.value : chr "< .001"
#> $ df : chr "2"
#> $ logLik : num -74
#> $ AIC : num 156
#> $ BIC : num 162
#> $ deviance : num 191
#> $ df.residual : chr "29"
#> $ nobs : int 32
Next, we typeset the remaining numeric columns and assign informative variable labels:
<- apa_num(tidy_lm_fit)
tidy_lm_fit
variable_labels(tidy_lm_fit) <- c(
term = "Term"
estimate = "$b$"
, statistic = paste0("$t(", glance_lm_fit$df.residual, ")")
, p.value = "$p$"
, conf.int = "95% CI"
,
)
<- apa_num(glance_lm_fit)
glance_lm_fit
variable_labels(glance_lm_fit) <- c(
r.squared = "$R^2$"
statistic = "$F$"
, p.value = "$p$"
, AIC = "$\\mathrm{AIC}$"
, )
Now we can use glue_apa_results()
to create the output
object. In doing so, we use the internal function
construct_glue()
to automatically determine the correct
“glue” of the reporting string. Let’s first examine the glue.
:::construct_glue(tidy_lm_fit, "estimate")
papaja#> [1] "$<<svl(estimate)>> = <<estimate>>$, <<svl(conf.int, use_math = TRUE)>> $<<conf.int>>$"
The character string contains a combination of text and
to-be-evaluated R code enveloped in <<
and
>>
. All variable names (e.g. estimate
)
are assumed to be columns of x
(here
tidy_lm_fit
) or any additional object passed to
glue_apa_results()
via ...
. svl()
is a function that returns a column variable label but, by default,
remove the math-environment tags ($
) as these are not
needed here.
<- glue_apa_results(
lm_results x = tidy_lm_fit
est_glue = papaja:::construct_glue(tidy_lm_fit, "estimate")
, stat_glue = papaja:::construct_glue(tidy_lm_fit, "statistic")
, term_names = sanitize_terms(tidy_lm_fit$term)
,
)
lm_results#> $estimate
#> $estimate$Intercept
#> [1] "$b = 39.69$, 95% CI $[36.18, 43.19]$"
#>
#> $estimate$cyl
#> [1] "$b = -1.51$, 95% CI $[-2.36, -0.66]$"
#>
#> $estimate$wt
#> [1] "$b = -3.19$, 95% CI $[-4.74, -1.64]$"
#>
#>
#> $statistic
#> $statistic$Intercept
#> [1] "$t(29) = 23.14$, $p < .001$"
#>
#> $statistic$cyl
#> [1] "$t(29) = -3.64$, $p = .001$"
#>
#> $statistic$wt
#> [1] "$t(29) = -4.22$, $p < .001$"
#>
#>
#> $full_result
#> $full_result$Intercept
#> [1] "$b = 39.69$, 95% CI $[36.18, 43.19]$, $t(29) = 23.14$, $p < .001$"
#>
#> $full_result$cyl
#> [1] "$b = -1.51$, 95% CI $[-2.36, -0.66]$, $t(29) = -3.64$, $p = .001$"
#>
#> $full_result$wt
#> [1] "$b = -3.19$, 95% CI $[-4.74, -1.64]$, $t(29) = -4.22$, $p < .001$"
#>
#>
#> $table
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 Intercept 39.69 1.71 23.14 < .001 36.18 43.19
#> 2 Cyl -1.51 0.41 -3.64 .001 -2.36 -0.66
#> 3 Wt -3.19 0.76 -4.22 < .001 -4.74 -1.64
#> conf.int
#> 1 [36.18, 43.19]
#> 2 [-2.36, -0.66]
#> 3 [-4.74, -1.64]
#>
#> attr(,"class")
#> [1] "apa_results" "list"
If we need to add additional information to this output, we can use
add_glue_to_apa_results()
. This function takes an existing
output and adds new strings to a specific sublist. So, let’s add some
model fit information to the output.
add_glue_to_apa_results(
.x = glance_lm_fit
container = lm_results
, sublist = "modelfit"
, est_glue = c(
, r2 = "$<<svl(r.squared)>> = <<r.squared>>$"
aic = ""
,
)stat_glue = c(
, r2 = papaja:::construct_glue(glance_lm_fit, "statistic")
aic = "$<<svl(AIC)>> = <<AIC>>$"
,
)
)#> $estimate
#> $estimate$Intercept
#> [1] "$b = 39.69$, 95% CI $[36.18, 43.19]$"
#>
#> $estimate$cyl
#> [1] "$b = -1.51$, 95% CI $[-2.36, -0.66]$"
#>
#> $estimate$wt
#> [1] "$b = -3.19$, 95% CI $[-4.74, -1.64]$"
#>
#> $estimate$modelfit
#> r2
#> "$R^2 = .83$"
#>
#>
#> $statistic
#> $statistic$Intercept
#> [1] "$t(29) = 23.14$, $p < .001$"
#>
#> $statistic$cyl
#> [1] "$t(29) = -3.64$, $p = .001$"
#>
#> $statistic$wt
#> [1] "$t(29) = -4.22$, $p < .001$"
#>
#> $statistic$modelfit
#> $statistic$modelfit$r2
#> [1] "$F(2, 29) = 70.91$, $p < .001$"
#>
#> $statistic$modelfit$aic
#> [1] "$\\mathrm{AIC} = 156.01$"
#>
#>
#>
#> $full_result
#> $full_result$Intercept
#> [1] "$b = 39.69$, 95% CI $[36.18, 43.19]$, $t(29) = 23.14$, $p < .001$"
#>
#> $full_result$cyl
#> [1] "$b = -1.51$, 95% CI $[-2.36, -0.66]$, $t(29) = -3.64$, $p = .001$"
#>
#> $full_result$wt
#> [1] "$b = -3.19$, 95% CI $[-4.74, -1.64]$, $t(29) = -4.22$, $p < .001$"
#>
#> $full_result$modelfit
#> $full_result$modelfit$r2
#> [1] "$R^2 = .83$, $F(2, 29) = 70.91$, $p < .001$"
#>
#> $full_result$modelfit$aic
#> [1] "$\\mathrm{AIC} = 156.01$"
#>
#>
#>
#> $table
#> term estimate std.error statistic p.value conf.low conf.high
#> 1 Intercept 39.69 1.71 23.14 < .001 36.18 43.19
#> 2 Cyl -1.51 0.41 -3.64 .001 -2.36 -0.66
#> 3 Wt -3.19 0.76 -4.22 < .001 -4.74 -1.64
#> conf.int
#> 1 [36.18, 43.19]
#> 2 [-2.36, -0.66]
#> 3 [-4.74, -1.64]
#>
#> attr(,"class")
#> [1] "apa_results" "list"
A final issue to consider is that users may pass inappropriate input
to apa_print()
. To ensure that we return correct output or
informative error messages, we need input validation. Currently,
papaja relies on the internal function
validate()
for this.
<- TRUE
in_paren :::validate(in_paren, check_class = "logical", check_length = 1)
papaja#> [1] TRUE
Please use either validate()
or perform input validation
using the assertthat
package.