eatGADS
allows importing data from SPSS
files basically without any loss of meta data. Meta data are stored in a
GADSdat
object with the following structure:
#> varName varLabel format display_width labeled value valLabel missings
#> 17 gender Gender F8.0 NA yes 1 Female valid
#> 18 gender Gender F8.0 NA yes 2 Male valid
varLabel
: a variable label; describing, what a variable
in the data representsformat
: the SPSS
format of a variabledisplay_width
: the display width of a variable in the
SPSS
formatlabeled
: does the variable contain value labels or
missing codes?On value level, additional meta data can be stored, namely:
valLabel
: value labels; labels describing, what a
certain value on a certain variable representsmissings
: missing codes indicating whether a certain
value represents a valid ("valid"
) or a missing
("miss"
) valueThese meta data provide an overview of the actual data and are
essential to their understanding, comparable to a codebook. Most
importantly, it also allows other researchers to understand the data
without much effort and ensures transparency. In this vignette we will
illustrate how in eatGADS
meta data can be accessed and how
it can be modified. We also discuss some recode and data modification
functions that lead to automatic meta data changes.
For illustrative purposes we use a small example data set from the
campus files of the German PISA Plus assessment (called
pisa
). The complete campus files and the original data set
can be accessed here
and here.
library(eatGADS)
<- pisa gads
With the function extractMeta()
you can inspect the
existing metadata of one, several or all variables in a
GADSdat
object. We will use this function frequently
throughout the vignette to check whether changes to meta data have been
successful.
extractMeta(gads, vars = c("hisei", "schtype"))
#> varName varLabel format display_width labeled value
#> 5 schtype School track F8.0 NA yes 1
#> 6 schtype School track F8.0 NA yes 2
#> 7 schtype School track F8.0 NA yes 3
#> 39 hisei Highest parental occupational status F8.2 NA no NA
#> valLabel missings
#> 5 Gymnasium (academic track) valid
#> 6 Realschule valid
#> 7 schools with several courses of education valid
#> 39 <NA> <NA>
Changes to meta data on variable level are straightforward. Variable
names can be changed with the changeVarNames()
function.
The old variable names are overwritten.
# Changing variable names
extractMeta(gads, vars = "hisei")
#> varName varLabel format display_width labeled value valLabel
#> 39 hisei Highest parental occupational status F8.2 NA no NA <NA>
#> missings
#> 39 <NA>
<- changeVarNames(GADSdat = gads, oldNames = "hisei", newNames = "hisei_new")
gads_labeled extractMeta(gads_labeled, vars = "hisei_new")
#> varName varLabel format display_width labeled value valLabel
#> 39 hisei_new Highest parental occupational status F8.2 NA no NA <NA>
#> missings
#> 39 <NA>
Variable labels can be adjusted analogously via the
changeVarLabels()
function.
# Changing variable labels
extractMeta(gads_labeled, vars = "hisei_new")
#> varName varLabel format display_width labeled value valLabel
#> 39 hisei_new Highest parental occupational status F8.2 NA no NA <NA>
#> missings
#> 39 <NA>
<- changeVarLabels(GADSdat = gads_labeled, varName = "hisei_new",
gads_labeled varLabel = "Parental occupational status (highest)")
extractMeta(gads_labeled, vars = "hisei_new")
#> varName varLabel format display_width labeled value valLabel
#> 39 hisei_new Parental occupational status (highest) F8.2 NA no NA <NA>
#> missings
#> 39 <NA>
The same applies for the SPSS
format of a variable using
the changeSPSSformat()
function.
# Changing SPSS format
extractMeta(gads_labeled, "hisei_new")
#> varName varLabel format display_width labeled value valLabel
#> 39 hisei_new Parental occupational status (highest) F8.2 NA no NA <NA>
#> missings
#> 39 <NA>
<- changeSPSSformat(GADSdat = gads_labeled, varName = "hisei_new",
gads_labeled format = "F10.2")
extractMeta(gads_labeled, "hisei_new")
#> varName varLabel format display_width labeled value valLabel
#> 39 hisei_new Parental occupational status (highest) F10.2 NA no NA <NA>
#> missings
#> 39 <NA>
Changes to meta data on value level follow the same principle. With
the changeValLabels()
function, value labels can be added
or modified. Note that value labels and missing codes should be given to
numeric values, even if a variable is a character variable.
# Adding value labels
extractMeta(gads_labeled, "hisei_new")
#> varName varLabel format display_width labeled value valLabel
#> 39 hisei_new Parental occupational status (highest) F10.2 NA no NA <NA>
#> missings
#> 39 <NA>
<- changeValLabels(GADSdat = gads_labeled, varName = "hisei_new",
gads_labeled value = c(-94, -99), valLabel = c("miss1", "miss2"))
extractMeta(gads_labeled, "hisei_new")
#> varName varLabel format display_width labeled value valLabel
#> 38 hisei_new Parental occupational status (highest) F10.2 NA yes -99 miss2
#> 39 hisei_new Parental occupational status (highest) F10.2 NA yes -94 miss1
#> missings
#> 38 valid
#> 39 valid
# Changing value labels
<- changeValLabels(GADSdat = gads_labeled, varName = "hisei_new",
gads_labeled value = c(-94, -99),
valLabel = c("missing: Question omitted",
"missing: Not administered"))
extractMeta(gads_labeled, "hisei_new")
#> varName varLabel format display_width labeled value
#> 38 hisei_new Parental occupational status (highest) F10.2 NA yes -99
#> 39 hisei_new Parental occupational status (highest) F10.2 NA yes -94
#> valLabel missings
#> 38 missing: Not administered valid
#> 39 missing: Question omitted valid
Value labels can also be deleted using the
removeValLabels()
function.
# Removing value labels
extractMeta(gads_labeled, "schtype")
#> varName varLabel format display_width labeled value valLabel
#> 4 schtype School track F8.0 NA yes 1 Gymnasium (academic track)
#> 5 schtype School track F8.0 NA yes 2 Realschule
#> 6 schtype School track F8.0 NA yes 3 schools with several courses of education
#> missings
#> 4 valid
#> 5 valid
#> 6 valid
<- removeValLabels(GADSdat = gads_labeled, varName = "schtype",
gads_labeled value = 1:3)
extractMeta(gads_labeled, "schtype")
#> varName varLabel format display_width labeled value valLabel missings
#> 4 schtype School track F8.0 NA no NA <NA> <NA>
Missing codes can be modified using the changeMissings()
function. Valid entries for missings
are
"miss"
and "valid"
.
# Defining missings
extractMeta(gads_labeled, "hisei_new")
#> varName varLabel format display_width labeled value
#> 38 hisei_new Parental occupational status (highest) F10.2 NA yes -99
#> 39 hisei_new Parental occupational status (highest) F10.2 NA yes -94
#> valLabel missings
#> 38 missing: Not administered valid
#> 39 missing: Question omitted valid
<- changeMissings(GADSdat = gads_labeled, varName = "hisei_new",
gads_labeled value = c(-94, -99), missings = c("miss", "miss"))
extractMeta(gads_labeled, "hisei_new")
#> varName varLabel format display_width labeled value
#> 36 hisei_new Parental occupational status (highest) F10.2 NA yes -99
#> 37 hisei_new Parental occupational status (highest) F10.2 NA yes -94
#> valLabel missings
#> 36 missing: Not administered miss
#> 37 missing: Question omitted miss
Usually an alignment of value labels and missing codes is desirable.
For example, in our variable hisei_new
the value
-94
has received a missing code and the value label
"missing: Questions omitted"
. To make these alignments
easier, the checkMissings()
function exists. It allows
searching for regular expressions in the value labels and comparing
missing codes and vice versa. Per default, missing codes are
automatically adjusted (addMissingCode = TRUE
) and value
label mismatches just reported
(addMissingLabel = FALSE
).
# Creating a new value label for a missing value but leaving the missing code as valid
<- changeValLabels(GADSdat = gads_labeled, varName = "gender",
gads_labeled value = -94, valLabel = "missing: Question omitted")
# Creating a new missing code but leaving the value label empty
<- changeMissings(GADSdat = gads_labeled, varName = "gender",
gads_labeled value = -99, missings = "miss")
# Checking value label and missing code allignment
<- checkMissings(gads_labeled, missingLabel = "missing")
gads_labeled #> The following variables have value labels including the term 'missing' which are not coded as missing:
#> gender
#> 'miss' is inserted into column missings for 1 rows.
#> The following variables have values coded as missings but value labels do not include the term 'missing':
#> gender
Sometimes one variable already contains the meta data which should be
added to another variable. reuseMeta()
can copy meta data
from one variable (other_varName
) to another variable
(varName
), even across different data sets. The function
allows us to transfer the complete meta data, only value labels or a
specific selection of value labels (only valid values or missing codes).
In the example below we transfer only the missing codes from variable
hisei_new
to variable age
.
extractMeta(gads_labeled, "age")
#> varName varLabel format display_width labeled value valLabel missings
#> 18 age Age of student at T1 F8.2 NA no NA <NA> <NA>
<- reuseMeta(GADSdat = gads_labeled, varName = "age",
gads_labeled other_GADSdat = gads_labeled, other_varName = "hisei_new",
missingLabels = "only", addValueLabels = TRUE)
extractMeta(gads_labeled, "age")
#> varName varLabel format display_width labeled value valLabel
#> 18 age Age of student at T1 F8.2 NA yes -99 missing: Not administered
#> 19 age Age of student at T1 F8.2 NA yes -94 missing: Question omitted
#> missings
#> 18 miss
#> 19 miss
Note that in eatGADS
meta data is stored together with
the actual data set. Therefore, changes to the actual data set imply
changes to the meta data. If a variable is removed from the data set,
its meta data is no longer needed. If a new variable is created, new
meta data needs to be created. If a variable is recoded, the meta data
entries need to be recoded accordingly.
If a certain subset of variables in the GADSdat
is
needed, individual variables can either be extracted
(extractVars()
) or removed (removeVars()
) from
the data set.
# Selecting variables
<- extractVars(gads_labeled,
gads_motint vars = c("int_a", "int_b", "int_c", "int_d", "instmot_a"))
#> Removing the following rows from meta data: idstud, idschool, idclass, schtype, sameteach, g8g9, ganztag, classsize, repeated, gender, age, language, migration, hisced, hisei_new, homepos, books, pared, computer_age, internet_age, int_use_a, int_use_b, truancy_a, truancy_b, truancy_c, instmot_b, instmot_c, instmot_d, norms_a, norms_b, norms_c, norms_d, norms_e, norms_f, anxiety_a, anxiety_b, anxiety_c, anxiety_d, anxiety_e, selfcon_a, selfcon_b, selfcon_c, selfcon_d, selfcon_e, worketh_a, worketh_b, worketh_c, worketh_d, worketh_e, worketh_f, worketh_g, worketh_h, worketh_i, intent_a, intent_b, intent_c, intent_d, intent_e, behav_a, behav_b, behav_c, behav_d, behav_e, behav_f, behav_g, behav_h, teach_a, teach_b, teach_c, teach_d, teach_e, cognact_a, cognact_b, cognact_c, cognact_d, cognact_e, cognact_f, cognact_g, cognact_h, cognact_i, discpline_a, discpline_b, discpline_c, discpline_d, discpline_e, relation_a, relation_b, relation_c, relation_d, relation_e, belong_a, belong_b, belong_c, belong_d, belong_e, belong_f, belong_g, belong_h, belong_i, attitud_a, attitud_b, attitud_c, attitud_d, attitud_e, attitud_f, attitud_g, attitud_h, grade_de, grade_ma, grade_bio, grade_che, grade_phy, grade_sci, ma_pv1, ma_pv2, ma_pv3, ma_pv4, ma_pv5, rea_pv1, rea_pv2, rea_pv3, rea_pv4, rea_pv5, sci_pv1, sci_pv2, sci_pv3, sci_pv4, sci_pv5
#> No rows added to meta data.
namesGADS(gads_motint)
#> [1] "int_a" "int_b" "int_c" "int_d" "instmot_a"
<- removeVars(gads_motint, vars = "instmot_a")
gads_int #> Removing the following rows from meta data: instmot_a
#> No rows added to meta data.
namesGADS(gads_int)
#> [1] "int_a" "int_b" "int_c" "int_d"
The function recodeGADS()
allows the recoding of a
variable.
# Original data and meta data
$dat$gender[1:10]
gads_labeled#> [1] 1 1 2 2 1 1 2 2 1 1
extractMeta(gads_labeled, "gender")
#> varName varLabel format display_width labeled value valLabel missings
#> 14 gender Gender F8.0 NA yes -99 <NA> miss
#> 15 gender Gender F8.0 NA yes -94 missing: Question omitted miss
#> 16 gender Gender F8.0 NA yes 1 Female valid
#> 17 gender Gender F8.0 NA yes 2 Male valid
# Recoding
<- recodeGADS(GADSdat = gads_labeled, varName = "gender",
gads_labeled oldValues = c(1, 2), newValues = c(10, 20))
# New data and meta data
$dat$gender[1:10]
gads_labeled#> [1] 10 10 20 20 10 10 20 20 10 10
extractMeta(gads_labeled, "gender")
#> varName varLabel format display_width labeled value valLabel missings
#> 14 gender Gender F8.0 NA yes -99 <NA> miss
#> 15 gender Gender F8.0 NA yes -94 missing: Question omitted miss
#> 16 gender Gender F8.0 NA yes 10 Female valid
#> 17 gender Gender F8.0 NA yes 20 Male valid
Moreover, the variable allows recoding of values that were not
assigned value labels or even NA
values.
# Recoding of NA values
$dat$int_a[1:10]
gads_labeled#> [1] 2 2 3 2 1 2 NA NA NA NA
<- recodeGADS(GADSdat = gads_labeled, varName = "int_a",
gads_labeled oldValues = NA, newValues = -94)
$dat$int_a[1:10]
gads_labeled#> [1] 2 2 3 2 1 2 -94 -94 -94 -94
For recoding specific values into NA
values, the
function recode2NA()
exists. It allows recoding a specific
value across multiple variables (while recodeGADS()
allows
recoding multiple values for a single variable). Existing value labels
for the specified values are deleted. For each variable it is reported
how many cases have been recoded.
# Recoding of values as Missing/NA
$dat$schtype[1:10]
gads_labeled#> [1] 2 3 1 3 2 3 1 3 2 1
<- recode2NA(gads_labeled, recodeVars = c("hisei_new", "schtype"),
gads_labeled value = "3")
#> Recodes in variable hisei_new: 0
#> Recodes in variable schtype: 111
$dat$schtype[1:10]
gads_labeled#> [1] 2 NA 1 NA 2 NA 1 NA 2 1
Adding variables to a GADSdat
object is unfortunately
not straight forward and requires some understanding of the underlying
object structure. A GADSdat
object is basically a list with
two entries, the data (dat
) and the meta data
(labels
). For adding variables we need to extract the
dat
object and add the new variables to it. Afterwards, we
need to update the meta data using the updateMeta()
function.
# Extract the data
<- gads_labeled$dat
newDat # Adding a variable
$classsize_kat <- ifelse(newDat$classsize > 15,
newDatyes = "big", no = "small")
# Updating meta data
<- updateMeta(gads_labeled, newDat = newDat)
gads_labeled2 #> No rows removed from meta data.
#> Adding meta data for the following variables: classsize_kat
extractMeta(gads_labeled2, "classsize_kat")
#> varName varLabel format display_width labeled value valLabel missings
#> classsize_kat classsize_kat <NA> <NA> NA no NA <NA> <NA>
So far, the introduced functions work well for modifying the meta
data of small data sets or for individual variables. However, frequently
we are interested in modifying the meta data of a larger number of
variables simultaneously. For this purpose eatGADS
provides
a workflow that works well with Excel spreadsheets. Thereby changes to
meta data are divided into two levels: the variable and the value
level.
We start by extracting this change table via the
getChangeMeta()
function
# variable level
<- getChangeMeta(GADSdat = pisa, level = "variable") meta_var
While in principle one could modify the change table directly in R,
it more convenient to do this in Excel. The change table can be written
to .xlsx
via the eatAnalysis::write_xlsx()
function. To perform changes, entries are made into the corresponding
“_new”-columns.
# write to excel
::write_xlsx(meta_var, row.names = FALSE, "variable_changes.xlsx") eatAnalysis
The excel file can be read back into R
via
readxl::read_xlsx()
.
# write to excel
<- readxl::read_excel("variable_changes.xlsx", col_types = rep("text", 8)) meta_var_changed
The applyChangeMeta()
function applies the meta data
changes to the GADSdat
object.
<- applyChangeMeta(meta_var_changed, GADSdat = pisa)
gads2 extractMeta(gads2, vars = c("idstud", "idschool", "schoolType"))
#> varName varLabel format display_width labeled value
#> 2 idstud Student Identifier Variable F8.0 NA no NA
#> 3 idschool School-ID F10.0 NA no NA
#> 5 schoolType School track F8.0 NA yes 1
#> 6 schoolType School track F8.0 NA yes 2
#> 7 schoolType School track F8.0 NA yes 3
#> valLabel missings
#> 2 <NA> <NA>
#> 3 <NA> <NA>
#> 5 Gymnasium (academic track) valid
#> 6 Realschule valid
#> 7 schools with several courses of education valid
At value level, information on value, value labels or missings can be changed. The general workflow is identical
# value level
<- getChangeMeta(GADSdat = pisa, level = "value") meta_val
# write to excel
::write_xlsx(meta_val, row.names = FALSE, "value_changes.xlsx") eatAnalysis
# write to excel
<- readxl::read_excel("value_changes.xlsx",
meta_val_changed col_types = c("text", rep(c("numeric", "text", "text"), 2)))
<- applyChangeMeta(meta_val_changed, GADSdat = pisa)
gads3 extractMeta(gads3, vars = c("schtype", "sameteach"))
#> varName varLabel format display_width labeled value
#> 4 schtype School track F8.0 NA yes 1
#> 5 schtype School track F8.0 NA yes 2
#> 6 schtype School track F8.0 NA yes 3
#> 7 sameteach Same math teacher in both school years F8.0 NA yes 10
#> 8 sameteach Same math teacher in both school years F8.0 NA yes 20
#> valLabel missings
#> 4 Acamedic Track valid
#> 5 Realschule valid
#> 6 schools with several courses of education valid
#> 7 No valid
#> 8 Yes valid