Tidycomm includes four functions for bivariate explorative data analysis:
crosstab()
for both categorical independent and dependent variablest_test()
for dichotomous categorical independent and continuous dependent variablesunianova()
for polytomous categorical independent and continuous dependent variablescorrelate()
for both continuous independent and dependent variablesWe will again use sample data from the Worlds of Journalism 2012-16 study for demonstration purposes:
WoJ#> # A tibble: 1,200 x 15
#> country reach employment temp_contract autonomy_selecti~ autonomy_emphas~
#> <fct> <fct> <chr> <fct> <dbl> <dbl>
#> 1 Germany Nation~ Full-time Permanent 5 4
#> 2 Germany Nation~ Full-time Permanent 3 4
#> 3 Switzerl~ Region~ Full-time Permanent 4 4
#> 4 Switzerl~ Local Part-time Permanent 4 5
#> 5 Austria Nation~ Part-time Permanent 4 4
#> 6 Switzerl~ Local Freelancer <NA> 4 4
#> 7 Germany Local Full-time Permanent 4 4
#> 8 Denmark Nation~ Full-time Permanent 3 3
#> 9 Switzerl~ Local Full-time Permanent 5 5
#> 10 Denmark Nation~ Full-time Permanent 2 4
#> # ... with 1,190 more rows, and 9 more variables: ethics_1 <dbl>,
#> # ethics_2 <dbl>, ethics_3 <dbl>, ethics_4 <dbl>, work_experience <dbl>,
#> # trust_parliament <dbl>, trust_government <dbl>, trust_parties <dbl>,
#> # trust_politicians <dbl>
crosstab()
outputs a contingency table for one independent (column) variable and one or more dependent (row) variables:
%>%
WoJ crosstab(reach, employment)
#> # A tibble: 3 x 5
#> employment Local Regional National Transnational
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Freelancer 23 36 104 9
#> 2 Full-time 111 287 438 66
#> 3 Part-time 15 32 75 4
Additional options include add_total
(adds a row-wise Total
column if set to TRUE
) and percentages
(outputs column-wise percentages instead of absolute values if set to TRUE
):
%>%
WoJ crosstab(reach, employment, add_total = TRUE, percentages = TRUE)
#> # A tibble: 3 x 6
#> employment Local Regional National Transnational Total
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Freelancer 0.154 0.101 0.169 0.114 0.143
#> 2 Full-time 0.745 0.808 0.710 0.835 0.752
#> 3 Part-time 0.101 0.0901 0.122 0.0506 0.105
Setting chi_square = TRUE
computes a \(\chi^2\) test including Cramer’s \(V\) and outputs the results in a console message:
%>%
WoJ crosstab(reach, employment, chi_square = TRUE)
#> Chi-square = 16.005239, df = 6.000000, p = 0.013726, V = 0.081663
#> # A tibble: 3 x 5
#> employment Local Regional National Transnational
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Freelancer 23 36 104 9
#> 2 Full-time 111 287 438 66
#> 3 Part-time 15 32 75 4
Finally, passing multiple row variables will treat all unique value combinations as a single variable for percentage and Chi-square computations:
%>%
WoJ crosstab(reach, employment, country, percentages = TRUE)
#> # A tibble: 15 x 6
#> employment country Local Regional National Transnational
#> <chr> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 Freelancer Austria 0.0134 0.0113 0.0162 0
#> 2 Freelancer Denmark 0.0537 0.0197 0.112 0.0127
#> 3 Freelancer Germany 0.0470 0.0507 0.00648 0
#> 4 Freelancer Switzerland 0.0403 0.00845 0.00162 0
#> 5 Freelancer UK 0 0.0113 0.0324 0.101
#> 6 Full-time Austria 0.0403 0.180 0.152 0.0127
#> 7 Full-time Denmark 0.168 0.192 0.295 0
#> 8 Full-time Germany 0.268 0.172 0.0616 0
#> 9 Full-time Switzerland 0.168 0.197 0.0875 0.0633
#> 10 Full-time UK 0.101 0.0676 0.113 0.759
#> 11 Part-time Austria 0 0.0225 0.0292 0
#> 12 Part-time Denmark 0.00671 0.0113 0.0178 0
#> 13 Part-time Germany 0 0.00282 0.00648 0
#> 14 Part-time Switzerland 0.0872 0.0479 0.0632 0
#> 15 Part-time UK 0.00671 0.00563 0.00486 0.0506
Use t_test()
to quickly compute t-Tests for a group variable and one or more test variables. Output includes test statistics, descriptive statistics and Cohen’s \(d\) effect size estimates:
%>%
WoJ t_test(temp_contract, autonomy_selection, autonomy_emphasis)
#> # A tibble: 2 x 10
#> Variable M_Permanent SD_Permanent M_Temporary SD_Temporary Delta_M t df
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 autonom~ 3.91 0.755 3.70 0.932 0.212 1.96 998
#> 2 autonom~ 4.12 0.768 3.89 0.870 0.237 2.17 995
#> # ... with 2 more variables: p <dbl>, d <dbl>
Passing no test variables will compute t-Tests for all numerical variables in the data:
%>%
WoJ t_test(temp_contract)
#> # A tibble: 11 x 10
#> Variable M_Permanent SD_Permanent M_Temporary SD_Temporary Delta_M t
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 autonomy_se~ 3.91 0.755 3.70 0.932 0.212 1.96
#> 2 autonomy_em~ 4.12 0.768 3.89 0.870 0.237 2.17
#> 3 ethics_1 1.57 0.850 1.98 0.990 -0.414 -3.41
#> 4 ethics_2 3.24 1.26 3.51 1.23 -0.269 -1.51
#> 5 ethics_3 2.37 1.12 2.28 0.928 0.0862 0.549
#> 6 ethics_4 2.53 1.24 2.57 1.22 -0.0323 -0.185
#> 7 work_experi~ 17.7 10.5 11.3 11.8 6.42 4.29
#> 8 trust_parli~ 3.07 0.797 3.02 0.772 0.0539 0.480
#> 9 trust_gover~ 2.87 0.847 2.64 0.811 0.229 1.92
#> 10 trust_parti~ 2.43 0.724 2.36 0.736 0.0719 0.703
#> 11 trust_polit~ 2.53 0.707 2.40 0.689 0.136 1.37
#> # ... with 3 more variables: df <dbl>, p <dbl>, d <dbl>
If passing a group variable with more than two unique levels, t_test()
will produce a warning
and default to the first two unique values. You can manually define the levels by setting the levels
argument:
%>%
WoJ t_test(employment, autonomy_selection, autonomy_emphasis)
#> Warning: employment has more than 2 levels, defaulting to first two (Full-time
#> and Part-time). Consider filtering your data or setting levels with the levels
#> argument
#> # A tibble: 2 x 10
#> Variable `M_Full-time` `SD_Full-time` `M_Part-time` `SD_Part-time` Delta_M
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 autonomy_se~ 3.90 0.782 3.83 0.633 0.0780
#> 2 autonomy_em~ 4.12 0.781 4.02 0.759 0.102
#> # ... with 4 more variables: t <dbl>, df <dbl>, p <dbl>, d <dbl>
%>%
WoJ t_test(employment, autonomy_selection, autonomy_emphasis, levels = c("Full-time", "Freelancer"))
#> # A tibble: 2 x 10
#> Variable `M_Full-time` `SD_Full-time` M_Freelancer SD_Freelancer Delta_M t
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 autonom~ 3.90 0.782 3.76 0.993 0.139 2.03
#> 2 autonom~ 4.12 0.781 3.90 0.852 0.217 3.29
#> # ... with 3 more variables: df <dbl>, p <dbl>, d <dbl>
Additional options include:
var.equal
: By default, t_test()
will assume equal variances for both groups. Set var.equal = FALSE
to compute t-Tests with the Welch approximation to the degrees of freedom.pooled_sd
: By default, the pooled variance will be used the compute Cohen’s \(d\) effect size estimates (\(s = \sqrt\frac{(n_1 - 1)s^2_1 + (n_2 - 1)s^2_2}{n_1 + n_2 - 2}\)). Set pooled_sd = FALSE
to use the simple variance estimation instead (\(s = \sqrt\frac{(s^2_1 + s^2_2)}{2}\)).paired
: Set paired = TRUE
to compute a paired t-Test instead. It is advisable to specify the case-identifying variable with case_var
when computing paired t-Tests, as this will make sure that data are properly sorted.unianova()
will compute one-way ANOVAs for one group variable and one or more test variables. Output includes test statistics and \(\eta^2\) effect size estimates.
%>%
WoJ unianova(employment, autonomy_selection, autonomy_emphasis)
#> # A tibble: 2 x 6
#> Var F df_num df_denom p eta_squared
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 autonomy_selection 2.42 2 1194 0.0896 0.00403
#> 2 autonomy_emphasis 5.86 2 1192 0.00293 0.00974
Descriptives can be added by setting descriptives = TRUE
. If no test variables are passed, all numerical variables in the data will be used:
%>%
WoJ unianova(employment, descriptives = TRUE)
#> # A tibble: 11 x 12
#> Var F df_num df_denom p eta_squared `M_Full-time` `SD_Full-time`
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 auto~ 2.42 2 1194 8.96e-2 0.00403 3.90 0.782
#> 2 auto~ 5.86 2 1192 2.93e-3 0.00974 4.12 0.781
#> 3 ethi~ 2.17 2 1197 1.15e-1 0.00361 1.62 0.895
#> 4 ethi~ 2.20 2 1197 1.11e-1 0.00367 3.24 1.27
#> 5 ethi~ 5.33 2 1197 4.98e-3 0.00882 2.39 1.13
#> 6 ethi~ 3.45 2 1197 3.20e-2 0.00574 2.58 1.25
#> 7 work~ 4.48 2 1184 1.15e-2 0.00752 17.5 10.7
#> 8 trus~ 1.53 2 1197 2.18e-1 0.00254 3.06 0.798
#> 9 trus~ 12.9 2 1197 2.97e-6 0.0210 2.82 0.835
#> 10 trus~ 0.842 2 1197 4.31e-1 0.00141 2.42 0.720
#> 11 trus~ 0.328 2 1197 7.21e-1 0.000547 2.52 0.704
#> # ... with 4 more variables: M_Part-time <dbl>, SD_Part-time <dbl>,
#> # M_Freelancer <dbl>, SD_Freelancer <dbl>
You can also compute Tukey’s HSD post-hoc tests by setting post_hoc = TRUE
. Results will be added as a tibble
in a list column post_hoc
.
%>%
WoJ unianova(employment, autonomy_selection, autonomy_emphasis, post_hoc = TRUE)
#> # A tibble: 2 x 7
#> Var F df_num df_denom p eta_squared post_hoc
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list>
#> 1 autonomy_selection 2.42 2 1194 0.0896 0.00403 <tibble [3 x 7]>
#> 2 autonomy_emphasis 5.86 2 1192 0.00293 0.00974 <tibble [3 x 7]>
These can then be unnested with tidyr::unnest()
:
%>%
WoJ unianova(employment, autonomy_selection, autonomy_emphasis, post_hoc = TRUE) %>%
::select(Var, post_hoc) %>%
dplyr::unnest(post_hoc)
tidyr#> # A tibble: 6 x 8
#> Var term contrast null.value estimate conf.low conf.high adj.p.value
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 autonomy~ emplo~ Part-time~ 0 -0.0780 -0.257 0.101 0.562
#> 2 autonomy~ emplo~ Freelance~ 0 -0.139 -0.296 0.0186 0.0966
#> 3 autonomy~ emplo~ Freelance~ 0 -0.0607 -0.282 0.160 0.796
#> 4 autonomy~ emplo~ Part-time~ 0 -0.102 -0.278 0.0741 0.362
#> 5 autonomy~ emplo~ Freelance~ 0 -0.217 -0.372 -0.0629 0.00284
#> 6 autonomy~ emplo~ Freelance~ 0 -0.115 -0.333 0.102 0.428
correlate()
will compute correlations for all combinations of the passed variables:
%>%
WoJ correlate(work_experience, autonomy_selection, autonomy_emphasis)
#> # A tibble: 3 x 5
#> x y r df p
#> <chr> <chr> <dbl> <int> <dbl>
#> 1 work_experience autonomy_selection 0.161 1182 2.71e- 8
#> 2 work_experience autonomy_emphasis 0.155 1180 8.87e- 8
#> 3 autonomy_selection autonomy_emphasis 0.644 1192 4.83e-141
If no variables passed, correlations for all combinations of numerical variables will be computed:
%>%
WoJ correlate()
#> # A tibble: 55 x 5
#> x y r df p
#> <chr> <chr> <dbl> <int> <dbl>
#> 1 autonomy_selection autonomy_emphasis 0.644 1192 4.83e-141
#> 2 autonomy_selection ethics_1 -0.0766 1195 7.98e- 3
#> 3 autonomy_selection ethics_2 -0.0274 1195 3.43e- 1
#> 4 autonomy_selection ethics_3 -0.0257 1195 3.73e- 1
#> 5 autonomy_selection ethics_4 -0.0781 1195 6.89e- 3
#> 6 autonomy_selection work_experience 0.161 1182 2.71e- 8
#> 7 autonomy_selection trust_parliament -0.00840 1195 7.72e- 1
#> 8 autonomy_selection trust_government 0.0414 1195 1.53e- 1
#> 9 autonomy_selection trust_parties 0.0269 1195 3.52e- 1
#> 10 autonomy_selection trust_politicians 0.0109 1195 7.07e- 1
#> # ... with 45 more rows
By default, Pearson’s product-moment correlations coefficients (\(r\)) will be computed. Set method
to "kendall"
to obtain Kendall’s \(\tau\) or to "spearman"
to obtain Spearman’s \(\rho\) instead.
To obtain a correlation matrix, pass the output of correlate()
to to_correlation_matrix()
:
%>%
WoJ correlate(work_experience, autonomy_selection, autonomy_emphasis) %>%
to_correlation_matrix()
#> # A tibble: 3 x 4
#> r work_experience autonomy_selection autonomy_emphasis
#> <chr> <dbl> <dbl> <dbl>
#> 1 work_experience 1 0.161 0.155
#> 2 autonomy_selection 0.161 1 0.644
#> 3 autonomy_emphasis 0.155 0.644 1