labelmachine
is an R package that helps assigning meaningful labels to R data sets. Manage your labels in yaml files, so called lama-dictionary files. This makes it very easy using the same label translations in multiple projects that share similar data structure.
Labeling your data can be easy!
# Install release version from CRAN
install.packages("labelmachine")
# Install development version from GitHub
devtools::install_github('a-maldet/labelmachine', build_vignettes = TRUE)
Let us assume, you want to label the following data frame df
:
df <- data.frame(
pupil_id = rep(1:4, each = 3),
subject = rep(c("eng", "mat", "gym"), 4),
result = c(1, 2, 2, NA, 2, NA, 1, 0, 1, 2, 3, NA),
stringsAsFactors = FALSE
)
df
#> pupil_id subject result
#> 1 1 eng 1
#> 2 1 mat 2
#> 3 1 gym 2
#> 4 2 eng NA
#> 5 2 mat 2
#> 6 2 gym NA
#> 7 3 eng 1
#> 8 3 mat 0
#> 9 3 gym 1
#> 10 4 eng 2
#> 11 4 mat 3
#> 12 4 gym NA
The column subject
contains the subject codes the pupils were tested in and result
contains the test results.
In order to assign labels to the values in the columns of df
, we need a lama-dictionary which holds the translations of the variables. With the command new_lama_dictionary()
we can create such a lama-dictionary:
library(labelmachine)
dict <- new_lama_dictionary(
sub = c(eng = "English", mat = "Mathematics", gym = "Gymnastics"),
res = c(
"1" = "Good",
"2" = "Passed",
"3" = "Not passed",
"4" = "Not passed",
NA_ = "Missed",
"0" = NA
)
)
dict
#>
#> --- lama_dictionary ---
#> Variable 'sub':
#> eng mat gym
#> "English" "Mathematics" "Gymnastics"
#>
#> Variable 'res':
#> 1 2 3 4 NA_
#> "Good" "Passed" "Not passed" "Not passed" "Missed"
#> 0
#> NA
Each entry in dict
is a translation for a variable (column) of the data frame df
. The translation sub
can be used to assign labels to the values given in column subject
in df
and translation res
can be used to assign labels to the values in column result
in df
. The expression NA_
is used to escape the missing value symbol NA
. Hence, the last assignment NA_ = "Missed"
defines that missing values should be labeled with the string "Missed"
. For further details on creating lama-dictionaries see Creating lama-dictionaries and Altering lama-dictionaries.
With the command lama_translate
, we can use the lama-dictionary dict
in order to translate the variables given in data frame df
:
df_new <- lama_translate(
df,
dict,
subject_lab = sub(subject),
result_lab = res(result)
)
str(df_new)
#> 'data.frame': 12 obs. of 5 variables:
#> $ pupil_id : int 1 1 1 2 2 2 3 3 3 4 ...
#> $ subject : chr "eng" "mat" "gym" "eng" ...
#> $ result : num 1 2 2 NA 2 NA 1 0 1 2 ...
#> $ subject_lab: Factor w/ 3 levels "English","Mathematics",..: 1 2 3 1 2 3 1 2 3 1 ...
#> $ result_lab : Factor w/ 4 levels "Good","Passed",..: 1 2 2 4 2 4 1 NA 1 2 ...
As we can see, the resulting data frame df_new
now holds two extra columns subject_lab
and result_lab
holding the factor variables with the right labels. The command lama_translate
has multiple features, which are described in more detail in Translating variables.
With the command lama_write
it is possible to save the lama-dictionary object to a yaml file:
path_to_file <- file.path(tempdir(), "my_dictionary.yaml")
lama_write(dict, path_to_file)
The resulting yaml file is a plain text file with a special text structure, see dictionary.yaml.
Lama-dictionary files make it easy to share lama-dictionaries with other projects holding similar data structures. With the command lama_read
a lama-dictionary file can be read:
path_to_file <- system.file("extdata", "dictionary_exams.yaml", package = "labelmachine")
dict <- lama_read(path_to_file)