library(splithalfr)
This vignette describes a sum score of answers on questions from the 23-item Rutgers Alcohol Problem Inventory (RAPI) (White & Labouvie, 1989);
Load the included RAPI dataset and inspect its documentation.
data("ds_rapi", package = "splithalfr")
?ds_rapi
The RAPI dataset is in wide format (i.e. one row per participant with each observation in a separate column). However, the splithalfr
requires long format (i.e. one row per observation). Below we reshape the RAPI dataset to long format.
ds_rapi <- reshape(
ds_rapi,
varying = list(paste("V", 1 : 23, sep = "")),
idvar = "twnr",
direction = "long",
timevar = "item",
v.names = "score"
)
The columns used in this example are:
twnr
, which identifies participantsitem
, which identifies itemsscore
, which contains the score of participant i on item jThe scoring function calculates the score of a single participant by summing their scores on each item.
fn_score <- function (ds) {
return (sum(ds$score))
}
Let’s calculate the RAPI score for the participant with twnr 396. NB - This score has also been calculated manually via Excel in the splithalfr repository.
fn_score(subset(ds_rapi, twnr == 396))
To calculate the RAPI score for each participant, we will use R’s native by
function and convert the result to a data frame.
scores <- by(
ds_rapi,
ds_rapi$twnr,
fn_score
)
data.frame(
twnr = names(scores),
score = as.vector(scores)
)
To calculate split-half scores for each participant, use the function by_split
. The first three arguments of this function are the same as for by
. An additional set of arguments allow you to specify how to split the data and how often. In this vignette we will calculate scores of 1000 permutated splits. Since each participant received the same unique sequence of items, we enabled match_participants
. See the vignette on splitting methods for more ways to split the data.
The by_split
function returns a data frame with the following columns:
participant
, which identifies participantsreplication
, which counts replicationsscore_1
and score_2
, which are the scores calculated for each of the split datasetsCalculating the split scores may take a while. By default, by_split
uses all available CPU cores, but no progress bar is displayed. Setting ncores = 1
will display a progress bar, but processing will be slower.
split_scores <- by_split(
ds_rapi,
ds_rapi$twnr,
fn_score,
replications = 1000,
match_participants = TRUE
)
Next, the output of by_split
can be analyzed in order to estimate reliability. By default, functions are provided that calculate Spearman-Brown adjusted Pearson correlations (spearman_brown
), Flanagan-Rulon (flanagan_rulon
), Angoff-Feldt (angoff_feldt
), and Intraclass Correlation (short_icc
) coefficients. Each of these coefficient functions can be used with split_coef
to calculate the corresponding coefficients per split, which can then be plotted or averaged via a simple mean
. A bias-corrected and accelerated bootstrap confidence interval can be calculated via split_ci
. Note that estimating the confidence interval involves very intensive calculations, so it can take a long time to complete.
# Spearman-Brown adjusted Pearson correlations per replication
coefs <- split_coefs(split_scores, spearman_brown)
# Distribution of coefficients
hist(coefs)
# Mean of coefficients
mean(coefs)
# Confidence interval of coefficients
split_ci(split_scores, spearman_brown)