After you have finished with Rasch Analysis, the score is outputted
in the file Data_final.csv
in the column called
rescaled
. This file will only contain the individuals
included in the analysis. Any individual who had too many missing values
(NA
) will not be in this file. It is often advisable to
merge the original data with all individuals with the new scores. Any
individual who did not have a score calculated will have an
NA
in this column.
This merge can be accomplished with the following code. First, open
the library called tidyverse
to access the necessary
functions. Next, read in the Data_final.csv
file and select
only the columns you need: ID
(or whatever the name of the
individual ID column is in your data) and rescaled
. The
code below assumes that the file is in your working directory. You will
have to include the full path to the file if it is not currently in your
working directory. Finally, you can create an object
merged_data
that merges your original data, here
represented with the object original_data
, with the new
score in a column renamed to "DisabilityScore"
with the
following code:
library(tidyverse)
<- read_csv("Data_final.csv") %>%
new_score select(c("ID", "rescaled"))
<- original_data %>%
merged_data left_join(new_score) %>%
rename("DisabilityScore" = "rescaled")
The sample data included in the whomds
package called
df_adults
already has a Rasch score merged with it, in the
column disability_score
.
After calculating the disability scores using Rasch Analysis, you are
now ready to analyze the results of the survey by calculating
descriptive statistics. The whomds
package contains
functions to create tables and figures of descriptive statistics. This
section will go over these functions.
Descriptive statistics functions included in the whomds
package are:
table_weightedpct()
- produces weighted tables of N or
%table_unweightedpctn()
- produces unweighted tables of
N and %table_basicstats()
- computes basic statistics of the
number of members per group per household.The arguments of each of these codes will be described below.
table_weightedpct()
whomds
contains a function called
table_weightedpct()
which calculates weighted results
tables from the survey, disaggregated by specified variables. The
arguments of this function are passed to functions in the package
dplyr
.
Below are the arguments of the function:
df
- the data frame with all the variables of
interestvars_ids
- variable names of the survey cluster
idsvars_strata
- variable names of the survey stratavars_weights
- variable names of the weightsformula_vars
- vector of the column names of variables
you would like to print results for...
- captures expressions for filtering or transmuting
the data. See the description of the argument willfilter
below for more detailsformula_vars_levels
- numeric vector of the factor
levels of the variables in formula_vars
. By default, the
function assumes the variables have two levels: 0 and 1by_vars
- the variables to disaggregate bypct
- a logical variable indicating whether or not to
calculate weighted percentages. Default is TRUE
for
weighted percentages. Set to FALSE
for weighted N.willfilter
- a variable that tells the function whether
or not to filter the data by a particular value.
formula_vars
have response options
of 0 and 1 but you only want to show the values for 1, then you would
say willfilter = TRUE
. Then at the end of your argument
list you write an expression for the filter. In this case, you would say
resp==1
.willfilter = FALSE
, then the function will
assume you want to “transmute” the data, in other words manipulate the
columns in some way, which for us often means to collapse response
options. For example, if your formula_vars
have 5 response
options, but you only want to show results for the sum of options
"Agree"
and "StronglyAgree"
, (after setting
spread_key="resp"
to spread the table by the response
options) you could set willfilter=FALSE
, and then directly
after write the expression for the transmutation, giving it a new column
name–in this case the expression would be
NewColName=Agree+AgreeStrongly
. Also write the names of the
other columns you would like to keep in the final table.willfilter
as its default of
NULL
, then the function will not filter or transmute
data.add_totals
- a logical variable determining whether to
create total rows or columns (as appropriate) that demonstrate the
margin that sums to 100. Keep as the default FALSE
to not
include totals.spread_key
- the variable to spread the table
horizontally by. Keep as the default NULL
to not spread the
table horizontally.spread_value
- the variable to fill the table with
after a horizontal spread. By default this argument is
"prop"
, which is a value created internally by the
function, and generally does not need to be changed.arrange_vars
- the list of variables to arrange the
table by. Keep as default NULL
to leave the arrangement as
is.include_SE
- a logical variable indicating whether to
include the standard errors in the table. Keep as the default
FALSE
to not include standard errors. As of this version of
whomds
, does not work when adding totals
(add_totals
is TRUE
), spreading
(spread_key
is not NULL
) or transmutting
(willfilter
is FALSE
).Here are some examples of how table_weightedpct()
would
be used in practice. Not all arguments are explicitly set in each
example, which means they are kept as their default values.
Let’s say we want to print a table of the percentage of people in
each disability level who gave each response option for a set of
questions about the general environment. We would set the arguments of
table_weightedpct()
like this, and the first few rows of
the table would look like this:
#Remove NAs from column used for argument by_vars
<- df_adults %>%
df_adults_noNA filter(!is.na(disability_cat))
table_weightedpct(
df = df_adults_noNA,
vars_ids = "PSU",
vars_strata = "strata",
vars_weights = "weight",
formula_vars = paste0("EF", 1:12),
formula_vars_levels = 1:5,
by_vars = "disability_cat",
spread_key = NULL,
spread_value = "prop",
arrange_vars = NULL,
willfilter = NULL
)#> # A tibble: 236 × 4
#> # Groups: disability_cat, item [48]
#> disability_cat item resp prop
#> <ord> <ord> <fct> <dbl>
#> 1 No EF1 1 42.1
#> 2 No EF1 2 15.8
#> 3 No EF1 3 15.2
#> 4 No EF1 4 12.4
#> 5 No EF1 5 14.6
#> 6 No EF10 1 44.2
#> 7 No EF10 2 10.5
#> 8 No EF10 3 22.8
#> 9 No EF10 4 11.6
#> 10 No EF10 5 10.9
#> # … with 226 more rows
The outputted table has 4 columns: the variable we disaggregated the
data by (disability_cat
, in other words the disability
level), the item (item
), the response option
(resp
), and the proportion (prop
).
This long table from the above example is great for data analysis,
but not great for reading with the bare eye. If we want to make it
nicer, we convert it to “wide format” by “spreading” by a particular
variable. Perhaps we want to spread by disability_cat
. Our
call to table_weightedpct()
would now look like this, and
the outputted table would be:
table_weightedpct(
df = df_adults_noNA,
vars_ids = "PSU",
vars_strata = "strata",
vars_weights = "weight",
formula_vars = paste0("EF", 1:12),
formula_vars_levels = 1:5,
by_vars = "disability_cat",
spread_key = "disability_cat",
spread_value = "prop",
arrange_vars = NULL,
willfilter = NULL
)#> # A tibble: 60 × 6
#> # Groups: item [12]
#> item resp No Mild Moderate Severe
#> <ord> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 EF1 1 42.1 47.5 43.4 46.6
#> 2 EF1 2 15.8 15.3 17.2 11.9
#> 3 EF1 3 15.2 14.2 14.3 15.0
#> 4 EF1 4 12.4 9.62 11.2 10.4
#> 5 EF1 5 14.6 13.3 13.8 16.1
#> 6 EF10 1 44.2 49.9 49.3 49.7
#> 7 EF10 2 10.5 12.6 10.3 9.84
#> 8 EF10 3 22.8 16.0 18.4 17.2
#> 9 EF10 4 11.6 11.9 14.6 13.3
#> 10 EF10 5 10.9 9.64 7.40 9.87
#> # … with 50 more rows
Now we can see our prop
column has been spread
horizontally for each level of disability_cat
.
Perhaps, though, we are only interested in the proportions of the
most extreme response option of 5. We could now add a filter to our call
to table_weightedpct()
like so:
table_weightedpct(
df = df_adults_noNA,
vars_ids = "PSU",
vars_strata = "strata",
vars_weights = "weight",
formula_vars = paste0("EF", 1:12),
formula_vars_levels = 1:5,
by_vars = "disability_cat",
spread_key = "disability_cat",
spread_value = "prop",
arrange_vars = NULL,
willfilter = TRUE,
== 5
resp
)#> # A tibble: 12 × 6
#> # Groups: item [12]
#> item resp No Mild Moderate Severe
#> <ord> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 EF1 5 14.6 13.3 13.8 16.1
#> 2 EF10 5 10.9 9.64 7.40 9.87
#> 3 EF11 5 1.75 0.974 2.27 1.75
#> 4 EF12 5 4.38 NA NA NA
#> 5 EF2 5 5.55 5.29 5.13 6.25
#> 6 EF3 5 4.85 5.40 3.69 5.16
#> 7 EF4 5 2.73 2.47 3.40 3.35
#> 8 EF5 5 10.4 13.4 8.80 11.2
#> 9 EF6 5 4.08 3.08 1.96 2.56
#> 10 EF7 5 4.29 3.28 2.72 4.68
#> 11 EF8 5 2.28 1.29 1.19 2.41
#> 12 EF9 5 5.20 3.59 4.89 5.26
Now you can see only the proportions for the response option of 5 are given.
With table_weightedpct()
, we can also add more levels of
disaggregation by editing the argument by_vars
. Here we
will produce the same table as in Example 3 above but now disaggregated
by disability level and sex:
table_weightedpct(
df = df_adults_noNA,
vars_ids = "PSU",
vars_strata = "strata",
vars_weights = "weight",
formula_vars = paste0("EF", 1:12),
formula_vars_levels = 1:5,
by_vars = c("disability_cat", "sex"),
spread_key = "disability_cat",
spread_value = "prop",
arrange_vars = NULL,
willfilter = TRUE,
== 5
resp
)#> # A tibble: 24 × 7
#> # Groups: sex, item [24]
#> sex item resp No Mild Moderate Severe
#> <fct> <ord> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 Female EF1 5 13.2 14.7 13.1 15.4
#> 2 Female EF10 5 11.1 9.91 6.41 9.62
#> 3 Female EF11 5 1.15 1.24 2.26 2.57
#> 4 Female EF12 5 3.60 NA NA NA
#> 5 Female EF2 5 7.44 6.21 6.18 6.94
#> 6 Female EF3 5 4.60 6.32 4.02 4.62
#> 7 Female EF4 5 3.19 3.20 3.00 2.34
#> 8 Female EF5 5 8.61 13.1 9.77 10.3
#> 9 Female EF6 5 3.25 3.76 2.29 2.89
#> 10 Female EF7 5 4.44 4.12 2.76 2.34
#> # … with 14 more rows
Perhaps we are still interested not only in response option 5, but
the sum of 4 and 5 together. We can do this by “transmuting” our table.
To do this, we first choose to “spread” by resp
by setting
spread_key="resp"
. This will convert the table to a wide
format as in Example 2, but now each column will represent a response
option. Then we set the transmutation by setting
willfilter=FALSE
, and adding expressions for the
transmutation on the next line. We name all the columns we would like to
keep and give an expression for how to create the new column of the sum
of proportions for response options 4 and 5, here called
problems
:
table_weightedpct(
df = df_adults_noNA,
vars_ids = "PSU",
vars_strata = "strata",
vars_weights = "weight",
formula_vars = paste0("EF", 1:12),
formula_vars_levels = 1:5,
by_vars = c("disability_cat", "sex"),
spread_key = "resp",
spread_value = "prop",
arrange_vars = NULL,
willfilter = FALSE,
problems = `4`+`5`
disability_cat, sex, item,
)#> # A tibble: 96 × 4
#> # Groups: disability_cat, sex, item [96]
#> disability_cat sex item problems
#> <ord> <fct> <ord> <dbl>
#> 1 No Female EF1 27.2
#> 2 No Female EF10 21.6
#> 3 No Female EF11 6.44
#> 4 No Female EF12 6.69
#> 5 No Female EF2 11.6
#> 6 No Female EF3 8.53
#> 7 No Female EF4 5.96
#> 8 No Female EF5 18.5
#> 9 No Female EF6 8.12
#> 10 No Female EF7 11.1
#> # … with 86 more rows
If we would like to modify the table again so that
disability_cat
represents the columns again, we can feed
this table into another function that will perform the pivot The
function to pivot tables is called pivot_wider()
, and it is
in the tidyr
package. To perform a second pivot, write the
code like this:
table_weightedpct(
df = df_adults_noNA,
vars_ids = "PSU",
vars_strata = "strata",
vars_weights = "weight",
formula_vars = paste0("EF", 1:12),
formula_vars_levels = 1:5,
by_vars = c("disability_cat", "sex"),
spread_key = "resp",
spread_value = "prop",
arrange_vars = NULL,
willfilter = FALSE,
problems = `4`+`5`
disability_cat, sex, item, %>%
) pivot_wider(names_from = disability_cat, values_from = problems)
#> # A tibble: 24 × 6
#> # Groups: sex, item [24]
#> sex item No Mild Moderate Severe
#> <fct> <ord> <dbl> <dbl> <dbl> <dbl>
#> 1 Female EF1 27.2 25.0 22.2 25.2
#> 2 Female EF10 21.6 19.7 22.6 20.6
#> 3 Female EF11 6.44 5.19 5.98 8.61
#> 4 Female EF12 6.69 NA NA NA
#> 5 Female EF2 11.6 11.2 11.4 13.8
#> 6 Female EF3 8.53 11.6 7.00 9.36
#> 7 Female EF4 5.96 4.69 6.35 4.37
#> 8 Female EF5 18.5 24.3 17.8 21.8
#> 9 Female EF6 8.12 8.72 5.06 5.33
#> 10 Female EF7 11.1 7.58 9.06 8.69
#> # … with 14 more rows
The names_from
argument of the function
pivot_wider()
tells R
which variable to use as
the columns, and values_from
tells R
what to
fill the columns with. The operator %>%
is commonly
referred to as a “pipe”. It feeds the object before it into the first
argument of the function after it. For example, if you have an object
x
and a function f
, writing
x %>% f()
would be the equivalent as writing
f(x)
. People use “pipes” because they make long sequences
of code easier to read.
table_unweightedpctn()
whomds
contains a function called
table_unweightedpctn()
that produces unweighted tables of N
and %. This is generally used for demographic tables. Its arguments are
as follows:
df
- the data frame with all the variables of
interestvars_demo
- vector with the names of the demographic
variables for which the N and % will be calculatedgroup_by_var
- name of the variable in which the
statistics should be stratified
(e.g. "disability_cat"
)spread_by_group_by_var
- logical determining whether to
spread the table by the variable given in group_by_var
.
Default is FALSE
.group_by_var_sums_to_100
- logical determining whether
percentages sum to 100 along the margin of group_by_var
, if
applicable. Default is FALSE
.add_totals
- a logical variable determining whether to
create total rows or columns (as appropriate) that demonstrate the
margin that sums to 100. Keep as the default FALSE
to not
include totals.Here is an example of how it is used:
table_unweightedpctn(df_adults_noNA,
vars_demo = c("sex", "age_cat", "work_cat", "edu_cat"),
group_by_var = "disability_cat",
spread_by_group_by_var = TRUE)
#> # A tibble: 12 × 10
#> item demo No_pct No_n Mild_pct Mild_n Moderate_pct Moderate_n Severe_pct
#> <chr> <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 sex Fema… 52.5 287 53.5 250 49.9 376 49.1
#> 2 sex Male 47.5 260 46.5 217 50.1 377 50.9
#> 3 age_cat 18-24 6.9 38 7.3 34 8.8 66 7.7
#> 4 age_cat 25-39 19.7 108 16.1 75 19.7 148 16.4
#> 5 age_cat 40-64 32.4 177 33.2 155 29.3 221 32.6
#> 6 age_cat 64-1… 41 224 43.5 203 42.2 318 43.2
#> 7 work_c… N 41.7 228 40.5 189 42 316 42.3
#> 8 work_c… Y 58.3 319 59.5 278 58 437 57.7
#> 9 edu_cat None 21.8 119 16.7 78 21.4 161 19
#> 10 edu_cat Elem… 37.8 207 38.8 181 39.7 299 41.3
#> 11 edu_cat Seco… 25 137 28.7 134 23.2 175 25.1
#> 12 edu_cat Univ… 15.4 84 15.8 74 15.7 118 14.6
#> # … with 1 more variable: Severe_n <dbl>
table_basicstats()
The function table_basicstats()
computes basic
statistics of the number of member per group per household. Its
arguments are:
df
- a data frame of household data where the rows
represent members of the households in the samplehh_id
- string (length 1) indicating the name of the
variable in df
uniquely identifying householdsgroup_by_var
- string (length 1) with name of variable
in df
to group results byHere is an example of how it is used:
table_basicstats(df_adults_noNA, "HHID", "age_cat")
#> # A tibble: 5 × 4
#> age_cat mean_sd median range
#> <chr> <chr> <dbl> <chr>
#> 1 18-24 0.1 (0.3) 0 0 - 1
#> 2 25-39 0.2 (0.4) 0 0 - 1
#> 3 40-64 0.3 (0.5) 0 0 - 1
#> 4 64-100 0.4 (0.5) 0 0 - 1
#> 5 Total 1 (0) 1 1 - 1
Descriptive statistics figure functions included in the
whomds
package are:
fig_poppyramid()
- produces a population pyramid figure
for the samplefig_dist()
- produces a plot of the distribution of a
scorefig_density()
- produces a plot of the density of a
scoreThe arguments of each of these codes will be described below.
fig_poppyramid()
whomds
contains a function called
fig_poppyramid()
that produces a population pyramid figure
for the sample. This function takes as arguments:
df
- the data where each row is a member of the
household from the household rostervar_age
- the name of the column in df
with the persons’ agesvar_sex
- the name of the column in df
with he persons’ sexesx_axis
- a string indicating whether to use absolute
numbers or sample percentage on the x-axis. Choices are "n"
(default) or "pct"
.age_plus
- a numeric value indicating the age that is
the first value of the oldest age group. Default is 100, for the last
age group to be 100+age_by
- a numeric value indicating the width of each
age group, in years. Default is 5.Running this function produces a figure like the one below:
fig_dist()
whomds
contains a function called
fig_dist()
that produces a plot of the distribution of a
score. WHO uses this function to show the distribution of the disability
scores calculated with Rasch Analysis. Its arguments are:
df
- data frame with the score of interestscore
- character variable of score variable name
ranging from 0 to 100; ex. "disability_score"
score_cat
- character variable of score categorization
variable name, ex. "disability_cat"
cutoffs
- a numeric vector of the cut-offs for the
score categorizationx_lab
- a string giving the x-axis label. Default is
"Score"
y_max
- maximum value to use on the y-axis. If left as
the default NULL
, the function will calculate a suitable
maximum automatically.pcent
- logical variable indicating whether to use
percent on the y-axis or frequency. Leave as default FALSE
for frequency and give TRUE
for percent.pal
- a string specifying the type of color palette to
use, passed to the function RColorBrewer::brewer.pal()
.
Default is "Blues"
.binwidth
- a numeric value giving the width of the bins
in the histograph. Default is 5.Running this function produces a figure like the one below.
fig_density()
whomds
contains a function similar to
fig_dist()
called fig_density()
that produces
a plot of the density of a score. WHO uses this function to show the
density distribution of the disability scores calculated with Rasch
Analysis. Its arguments are:
df
- data frame with the score of interestscore
- character variable of score variable name
ranging from 0 to 100; ex. "disability_score"
var_color
- a character variable of the column name to
set color of density lines by. Use this variable if you could like to
print the densities of different groups onto the same plot. Default is
NULL
.var_facet
- a character variable of the column name for
the variable to create a ggplot2::facet_grid()
with, which
will plot densities of different groups in side-by-side plots. Default
is NULL
.cutoffs
- a numeric vector of the cut-offs for the
score categorizationx_lab
- a string giving the x-axis label. Default is
"Score"
pal
- a string specifying either a manual color to use
for the color aesthetic, a character vector explictly specifying the
colors to use for the color scale, or as the name of a palette to pass
to RColorBrewer::brewer.pal()
with the name of the color
palette to use for the color scale. Default is
"Paired"
adjust
- a numeric value to pass to adjust
argument of ggplot2::geom_density()
, which controls
smoothing of the density function. Default is 2.size
- a numeric value to pass to size
argument of ggplot2::geom_density()
, which controls the
thickness of the lines. Default is 1.5.Running this function produces a figure like the one below.
WHO also provides a template for calculating many descriptive
statistics tables for use in survey reports, also written in
R
. If you would like a template for your country, please
contact us (see DESCRIPTION for contact info).