CRAN_Status_Badge CRAN RStudio mirror downloads

cNorm

The package cNorm provides methods for generating continuous standard scores, as f. e. for psychometric test development, biometrics (e. g. biological and physiological growth curves), and screenings in the medical domain. It is based on the approach suggested by A. Lenhard et al. (2016, 2019). For an in-depth tutorial please consult the project homepage https://www.psychometrica.de/cNorm_en.html and https://cnorm.shinyapps.io/cNORM/ for an online demonstration.

Approach

Conventional methods for producing test norms are often plagued with “jumps” or “gaps” (i.e., discontinuities) in norm tables and low confidence for assessing extreme scores. cNORM addresses these problems and also has the added advantage of not requiring assumptions about the distribution of the raw data: The standard scores are established from raw data by modeling the latter ones as a function of both percentile scores and an explanatory variable (e.g., age) through Taylor polynomials. The method minimizes bias arising from sampling and measurement error, while handling marked deviations from normality – such as are commonplace in clinical samples. It includes procedures for post stratification of norm samples to overcome bias in data collection and to mitigate violations of representativeness. Contrary to parametric approaches, it does not rely on distribution assumptions of the initial norm data and is thus a very robust approach in generating norm tables.

The rationale of the approach is model the relationship between location / norm score, age and raw score via multiple regression and to fit a 3-dimensional hyperplane. This hyperplane is used to close all gaps and to compute continuous norm scores:

Installation

cNORM can be installed via {r example} install.packages("cNORM", dependencies = TRUE)

Additionally, you can download a precompiled version or access the github development version via ```{r example} install.packages(“devtools”) library(devtools)

devtools::install_github(“WLenhard/cNORM”) library(cNORM)


Please report errors. Suggestions for improvement are always welcome!

## Example

Conducting the analysis consists of the following steps:
1.  Data preparation and establishing the regression model
1.  Validating the model
1.  Generating norm tables and plotting the results

cNORM offers functions for selecting the best fitting models and in generating the norm tables.

```{r example}
## In a nutshell:
## Basic example code for modeling the sample dataset
library(cNORM)

# Start the graphical user interface (needs shiny installed)
# The GUI includes the most important functions. For specific cases,
# please use cNORM on the console.
cNORM.GUI()

# Easy start: Conventional norming for one group without continuum over age
# with the inbuilt elfe dataset.
cnorm(raw = elfe$raw)

# Rank data within group and compute powers and interactions for the internal 
# dataset 'elfe' and compute model. The resulting object includes the ranked 
# data via object$data and model via object$model.
cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group)

# Plot R2 of different model solutions in dependence of the number of predictors
plot(cnorm.elfe, "subset", type=0)        # plot R2
plot(cnorm.elfe, "subset", type=3)        # plot MSE

# NOTE! At this point, you usually select a good fitting model and rerun the process
# with a fixed number of terms, e. g. four. Avoid models with a high number of terms:
cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group, terms = 4)

# Per default, the power parameter is set to k = 5 and t = 3. You can choose a value up 
# to 6, but higher values can lead to overfit. In case of overfit, please reduce these
# values. In case, only k is specified, cNORM uses this value for both k and t.
# In the following example, the distribution per age is modeled with power parameter 
# k = 3 (= cubic), while for the age, there is only a quadratic trajectory (-> 't = 2').
cnorm.elfe <- cnorm(raw = elfe$raw, group = elfe$group, k = 3, t = 2)

#  Visual inspection of the percentile curves of the fitted model
plot(cnorm.elfe, "percentiles")

# Visual inspection of the observed and fitted raw and norm scores
plot(cnorm.elfe, "norm")
plot(cnorm.elfe, "raw")
plot(cnorm.elfe, "raw", group = "group") # show fit per grouping variable

# In order to check, how other models perform, plot series of percentile plots with ascending
# number of predictors, in this example up to 14 predictors.
plot(cnorm.elfe, "series", end=14)

# Cross validation of number of terms with 20% of the data for validation and 80% training.
# Due to the time intensity, max terms is restricted to 10 in this example; 3 repetitions
cnorm.cv(cnorm.elfe$data, max=10, repetitions=3)

# Cross validation with pre-specified terms, e. g. of an already existing model
cnorm.cv(cnorm.elfe, repetitions=3)

# Print norm table (for grade 3, 3.2, 3.4, 3.6)
normTable(c(3, 3.2, 3.4, 3.6), cnorm.elfe)

# The other way round: Print raw table (for grade 3) together with 90% confidence intervalls
# for a test with a reliability of .94
rawTable(3, cnorm.elfe, CI = .9, reliability = .94)

# Get the predicted norm scores for a vector of raw scores and explanatory variable, e. g. age
predicted <- predictNorm(elfe$raw, elfe$group, cnorm.elfe)

# In case of unbalanced datasets deviating from the census, the norm data
# can be weighted by the means of raking / post stratification. Please generate
# the weights with the computeWeights() function and pass them as the weights
# parameter. For computing the weights, please specify a data.frame with the
# population margins (further information is available in the computeWeights
# function). A demonstration based on sex and migration status in vocabulary
# development (ppvt dataset):
margins <- data.frame(variables = c("sex", "sex",
                                    "migration", "migration"),
                      levels = c(1, 2, 0, 1),
                      share = c(.52, .48, .7, .3))
weights <- computeWeights(ppvt, margins)
model <- cnorm(raw = ppvt$raw, group=ppvt$group, weights = weights)


# start vignette for a complete walk through
vignette("cNORM-Demo", package = "cNORM")
vignette("WeightedRegression", package = "cNORM")

cNORM offers functions to choose the optimal model, both from a visual inspection of the percentiles, as well as by information criteria and model tests:

In this example, a Taylor polynomial with power k = 4 was computed in order to model a sample of the ELFE 1-6 reading comprehension test (sentence completion task; W. Lenhard & Schneider, 2006). In the plot, you can see the share of variance explained by the different models (with progressing number of predictors). Adjusted R2, Mallow’s Cp (an AIC like measure) and BIC is used (BIC is available through the option type = 2). The predefined adjusted R2 value of .99 is already reached with the third model and afterwards we only get minor improvements in adjusted R2. On the other hand, Cp rapidly declines afterwards, so model 3 seems to be a good candidate in terms of the relative information content per predictor and the captured information (adjusted R2). It is advisable to choose a model at the “elbow” in order to avoid over-fitting, but the solution should be tested for violations of model assumptions and the progression of the percentiles should be inspected visually, as well.

The predicted progression over age are displayed as lines and the manifest data as dots. Only three predictors were necessary to almost perfectly model the norm sample data with adjusted R2.

Sample Data

The package includes data from two large test norming projects, namely ELFE 1-6 (Lenhard & Schneider, 2006) and German adaption of the PPVT4 (A. Lenhard, Lenhard, Suggate & Seegerer, 2015), which can be used to run the analysis. Furthermore, large samples from the Center of Disease Control (CDC) on growth curves in childhood and adolescence (for computing Body Mass Index ‘BMI’ curves), life expectancy at birth and mortality per country from 1960 to 2017 (available from The World Bank). Type ?elfe, ?ppvt, ?CDC, ?epm, ?mortality or ?life to display information on the data sets.

Terms of use, license and declaration of interest

cNORM is licensed under GNU Affero General Public License v3 (AGPL-3.0). This means that copyrighted parts of cNORM can be used free of charge for commercial and non-commercial purposes that run under this same license, retain the copyright notice, provide their source code and correctly cite cNORM. Copyright protection includes, for example, the reproduction and distribution of source code or parts of the source code of cNORM or of graphics created with cNORM. The integration of the package into a server environment in order to access the functionality of the software (e.g. for online delivery of norm scores) is also subject to this license. However, a regression function determined with cNORM is not subject to copyright protection and may be used freely without preconditions. If you want to apply cNORM in a way that is not compatible with the terms of the AGPL 3.0 license, please do not hesitate to contact us to negotiate individual conditions. If you want to use cNORM for scientific publications, we would also ask you to quote the source.

The authors would like to thank WPS (https://www.wpspublish.com/) for providing funding for developing, integrating and evaluating weighting and post stratification in the cNORM package. The research project was conducted in 2022.

References