Get started

Adrian Maldet

2019-10-08

labelmachine is an R package that helps assigning meaningful labels to R data sets. Manage your labels in yaml files, so called lama-dictionary files. This makes it very easy using the same label translations in multiple projects that share similar data structure.

Labeling your data can be easy!

Installation

# Install release version from CRAN
install.packages("labelmachine")

# Install development version from GitHub
devtools::install_github('a-maldet/labelmachine', build_vignettes = TRUE)

The example data frame

Let us assume, you want to label the following data frame df:

df <- data.frame(
  pupil_id = rep(1:4, each = 3),
  subject = rep(c("eng", "mat", "gym"), 4),
  result = c(1, 2, 2, NA, 2, NA, 1, 0, 1, 2, 3, NA),
  stringsAsFactors = FALSE
)
df
#>    pupil_id subject result
#> 1         1     eng      1
#> 2         1     mat      2
#> 3         1     gym      2
#> 4         2     eng     NA
#> 5         2     mat      2
#> 6         2     gym     NA
#> 7         3     eng      1
#> 8         3     mat      0
#> 9         3     gym      1
#> 10        4     eng      2
#> 11        4     mat      3
#> 12        4     gym     NA

The column subject contains the subject codes the pupils were tested in and result contains the test results.

Your first lama-dictionary

In order to assign labels to the values in the columns of df, we need a lama-dictionary which holds the translations of the variables. With the command new_lama_dictionary() we can create such a lama-dictionary:

library(labelmachine)
dict <- new_lama_dictionary(
  sub = c(eng = "English", mat = "Mathematics", gym = "Gymnastics"),
  res = c(
    "1" = "Good",
    "2" = "Passed",
    "3" = "Not passed",
    "4" = "Not passed",
    NA_ = "Missed",
    "0" = NA
  )
)
dict
#> 
#> --- lama_dictionary ---
#> Variable 'sub':
#>           eng           mat           gym 
#>     "English" "Mathematics"  "Gymnastics" 
#> 
#> Variable 'res':
#>            1            2            3            4          NA_ 
#>       "Good"     "Passed" "Not passed" "Not passed"     "Missed" 
#>            0 
#>           NA

Each entry in dict is a translation for a variable (column) of the data frame df. The translation sub can be used to assign labels to the values given in column subject in df and translation res can be used to assign labels to the values in column result in df. The expression NA_ is used to escape the missing value symbol NA. Hence, the last assignment NA_ = "Missed" defines that missing values should be labeled with the string "Missed". For further details on creating lama-dictionaries see Creating lama-dictionaries and Altering lama-dictionaries.

Translate the data frame columns

With the command lama_translate, we can use the lama-dictionary dict in order to translate the variables given in data frame df:

df_new <- lama_translate(
    df,
    dict,
    subject_lab = sub(subject),
    result_lab = res(result)
  )
str(df_new)
#> 'data.frame':    12 obs. of  5 variables:
#>  $ pupil_id   : int  1 1 1 2 2 2 3 3 3 4 ...
#>  $ subject    : chr  "eng" "mat" "gym" "eng" ...
#>  $ result     : num  1 2 2 NA 2 NA 1 0 1 2 ...
#>  $ subject_lab: Factor w/ 3 levels "English","Mathematics",..: 1 2 3 1 2 3 1 2 3 1 ...
#>  $ result_lab : Factor w/ 4 levels "Good","Passed",..: 1 2 2 4 2 4 1 NA 1 2 ...

As we can see, the resulting data frame df_new now holds two extra columns subject_lab and result_lab holding the factor variables with the right labels. The command lama_translate has multiple features, which are described in more detail in Translating variables.

Save your lama-dictionary

With the command lama_write it is possible to save the lama-dictionary object to a yaml file:

path_to_file <- file.path(tempdir(), "my_dictionary.yaml")
lama_write(dict, path_to_file)

The resulting yaml file is a plain text file with a special text structure, see dictionary.yaml.

Lama-dictionary files make it easy to share lama-dictionaries with other projects holding similar data structures. With the command lama_read a lama-dictionary file can be read:

path_to_file <- system.file("extdata", "dictionary_exams.yaml", package = "labelmachine")
dict <- lama_read(path_to_file)

Further reading