The blaise package aims to provide an interface between the blaise software and R by enabling the reading and writing of blaise datafile sin a transparent manner. The aim is for an average user to be able to read or write such a datafile with a single command. Defaults are always set in such a way that the data is not changed if a user reads a datafile to a dataframe and immediately writes it to a blaise datafile afterwards.
For the purpose of this vignette we need to create some small examples.
= "
model1 DATAMODEL Test
FIELDS
A : STRING[1]
B : INTEGER[1]
C : REAL[3,1]
D : REAL[3]
E : (Male, Female)
F : 1..20
G : 1.00..100.00
ENDMODEL
"
= "
model2 DATAMODEL Test
FIELDS
A : STRING[1]
B : INTEGER[1]
C : REAL[3,1]
D : REAL[3]
E : (Male (1), Female (2), Unknown (9))
F : 1..20
G : 1.00..100.00
ENDMODEL
"
=
data1 "A12.30.11 1 1.00
B23.41.2210 20.20
C34.50.0120100.00"
=
data2 "A12,30,11 1 1,00
B23,41,2210 20,20
C34,50,0920100,00"
= tempfile('testbla1', fileext = '.bla')
blafile1 = tempfile('testdata1', fileext = '.asc')
datafile1 = tempfile('testbla2', fileext = '.bla')
blafile2 = tempfile('testdata2', fileext = '.asc')
datafile2 writeLines(data1, con = datafile1)
writeLines(model1, con = blafile1)
writeLines(data2, con = datafile2)
writeLines(model2, con = blafile2)
These file can then be simply read to a dataframe by using read_fwf_blaise
.
= read_fwf_blaise(datafile1, blafile1)
df
df#> # A tibble: 3 x 7
#> A B C D E F G
#> <chr> <int> <dbl> <dbl> <fct> <int> <dbl>
#> 1 A 1 2.3 0.1 Male 1 1
#> 2 B 2 3.4 1.2 Female 10 20.2
#> 3 C 3 4.5 0 Male 20 100
If you try to read the second datafile with model you will however get some warnings and the resulting dataframe will not look as expected.
= read_fwf_blaise(datafile2, blafile2)
df_comma #> Warning: 9 parsing failures.
#> row col expected actual file
#> 1 C no trailing characters 2,3 'C:\Users\kingmob\AppData\Local\Temp\RtmpAzoQB8\testdata2170029b3128c.asc'
#> 1 D no trailing characters 0,1 'C:\Users\kingmob\AppData\Local\Temp\RtmpAzoQB8\testdata2170029b3128c.asc'
#> 1 G no trailing characters 1,00 'C:\Users\kingmob\AppData\Local\Temp\RtmpAzoQB8\testdata2170029b3128c.asc'
#> 2 C no trailing characters 3,4 'C:\Users\kingmob\AppData\Local\Temp\RtmpAzoQB8\testdata2170029b3128c.asc'
#> 2 D no trailing characters 1,2 'C:\Users\kingmob\AppData\Local\Temp\RtmpAzoQB8\testdata2170029b3128c.asc'
#> ... ... ...................... ...... .................................................................................
#> See problems(...) for more details.
df_comma#> # A tibble: 3 x 7
#> A B C D E F G
#> <chr> <int> <dbl> <dbl> <fct> <int> <dbl>
#> 1 A 1 NA NA 1 1 NA
#> 2 B 2 NA NA 2 10 NA
#> 3 C 3 NA NA 9 20 NA
The blaise package uses readr to actually read the file into memory. Reading problems can therefore be analysed by using readr::problems()
::problems(df_comma)
readr#> # A tibble: 9 x 5
#> row col expected actual file
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 C no trailing cha~ 2,3 "'C:\\Users\\kingmob\\AppData\\Local\\Tem~
#> 2 1 D no trailing cha~ 0,1 "'C:\\Users\\kingmob\\AppData\\Local\\Tem~
#> 3 1 G no trailing cha~ 1,00 "'C:\\Users\\kingmob\\AppData\\Local\\Tem~
#> 4 2 C no trailing cha~ 3,4 "'C:\\Users\\kingmob\\AppData\\Local\\Tem~
#> 5 2 D no trailing cha~ 1,2 "'C:\\Users\\kingmob\\AppData\\Local\\Tem~
#> 6 2 G no trailing cha~ 20,20 "'C:\\Users\\kingmob\\AppData\\Local\\Tem~
#> 7 3 C no trailing cha~ 4,5 "'C:\\Users\\kingmob\\AppData\\Local\\Tem~
#> 8 3 D no trailing cha~ 0,0 "'C:\\Users\\kingmob\\AppData\\Local\\Tem~
#> 9 3 G no trailing cha~ 100,00 "'C:\\Users\\kingmob\\AppData\\Local\\Tem~
These results are somewhat easier to parse but still hard to interpret. In this case it is clear that the comma is an unexpected character. This is because the locale is set to expect “.” as a decimal separator by default. This setting (and others, such as date format, encoding, etc.) can be changed by supplying a readr locale object using readr::locale()
.
= read_fwf_blaise(datafile2, blafile2, locale = readr::locale(decimal_mark = ","))
df_comma
df_comma#> # A tibble: 3 x 7
#> A B C D E F G
#> <chr> <int> <dbl> <dbl> <fct> <int> <dbl>
#> 1 A 1 2.3 0.1 1 1 1
#> 2 B 2 3.4 1.2 2 10 20.2
#> 3 C 3 4.5 0 9 20 100
The second datamodel contains a numbered enum and is therefore read as a factor with number labels. By interpreting it thus the file will be written out exactly the same as can be seen later. This behaviour can be overwritten by using the option numbered_enum = FALSE
. If the resulting dataframe is written back to blaise using write_fwf_blaise
it will however write the integers in the set 1,2,3 instead of 1,2,9.
= read_fwf_blaise(datafile2, blafile2, locale = readr::locale(decimal_mark = ","), numbered_enum = FALSE)
df_enum
df_enum#> # A tibble: 3 x 7
#> A B C D E F G
#> <chr> <int> <dbl> <dbl> <fct> <int> <dbl>
#> 1 A 1 2.3 0.1 Male 1 1
#> 2 B 2 3.4 1.2 Female 10 20.2
#> 3 C 3 4.5 0 Unknown 20 100
Finally, instead of reading the file into memory, a LaF object can be returned instead. For details see the documentation for the LaF
package.
= read_fwf_blaise(datafile1, blafile1, output = "laf")
df_laf
df_laf#> Connection to fixed width ASCII file
#> Filename: C:\Users\kingmob\AppData\Local\Temp\RtmpAzoQB8\testdata117003f9f19a0.asc
#> Column 1: name = A, type = string, internal type = character, column width = 1
#> Column 2: name = B, type = integer, internal type = integer, column width = 1
#> Column 3: name = C, type = double, internal type = numeric, column width = 3
#> Column 4: name = D, type = double, internal type = numeric, column width = 3
#> Column 5: name = E, type = integer, internal type = integer, column width = 1
#> Column 6: name = F, type = integer, internal type = integer, column width = 2
#> Column 7: name = G, type = double, internal type = numeric, column width = 6
$E
df_laf#> Column number 5 of fixed width ASCII file
#> Filename: C:\Users\kingmob\AppData\Local\Temp\RtmpAzoQB8\testdata117003f9f19a0.asc
#> Column name = E
#> Column type = integer
#> Internal type = integer
#> Column width = 1
#> Showing first 10 elements:
#> [1] Male Female Male
#> Levels: Male Female
Dataframes can also be written out as blaise datafiles. By default this will also write a corresponding blaise datamodel with the same filename and a .bla extension
= tempfile(fileext = ".asc")
outfile = sub(".asc", ".bla", outfile)
outbla write_fwf_blaise(df, outfile)
::read_lines(outfile)
readr#> [1] "A12.30.11 1 1.0" "B23.41.2210 20.2" "C34.50.0120100.0"
::read_lines(outbla)
readr#> [1] "DATAMODEL df" "FIELDS" " A : STRING[1]" " B : INTEGER[1]"
#> [5] " C : REAL[3]" " D : REAL[3]" " E : (Male," " Female)"
#> [9] " F : INTEGER[2]" " G : REAL[5]" "ENDMODEL"
As can be seen, this is equivalent to the input data and model. An optional name for the datamodel can be given with output_model
or the writing of a model can be entirely suppressed by using write_model = FALSE
. For further options see the help file. Implicit conversions from R types to blaise types are as follows:
Note that information about the labels in the datamodel is lost for the numbered enum type. One way to solve this is by providing an existing datamodel and using write_fwf_blaise_with_model
as follows.
= tempfile(fileext = ".asc")
outfile_model write_fwf_blaise_with_model(df_enum, outfile_model, blafile2)
::read_lines(outfile_model)
readr#> [1] "A12.30.11 1 1.00" "B23.41.2210 20.20" "C34.50.0920100.00"
This results in the same datafile here, but ensures conformity to the datamodel. One could for instance also force a different model on the same data.
= "
model3 DATAMODEL Test
FIELDS
A : (A, B, C)
B : (Male (1), Female (2), Unknown (3))
ENDMODEL
"
= tempfile('testbla3', fileext = '.bla')
blafile3 writeLines(model3, con = blafile3)
= tempfile(fileext = ".asc")
outfile_new_model write_fwf_blaise_with_model(df_enum, outfile_new_model, blafile3)
::read_lines(outfile_new_model)
readr#> [1] "11" "22" "33"
This explicitly checks for conformity, so if the data can not be converted an error will be shown and nothing will be written to disk.
= "
model4 DATAMODEL Test
FIELDS
A : (A, B)
B : (Male (1), Female (2))
ENDMODEL
"
= tempfile('testbla4', fileext = '.bla')
blafile4 writeLines(model4, con = blafile4)
= tempfile(fileext = ".asc")
outfile_wrong_model write_fwf_blaise_with_model(df_enum, outfile_wrong_model, blafile4)
#> Error in cast_type(variables(model)[zonder_dummy][[i]], df[[i]]): numbers in dataframe column (A;B;C) do not correspond to range of indices in model (A;B) for variable A