There are already a couple of packages to parse JSON (e.g. rjson
or jsonlite
). But for me they all had some drawbacks when working with JSON columns from a database:
RJSONIO
and rjson
silently use only the first element of the input; jsonlite
collapses the input, producing invalid JSON and then complains about trailing garbage.NA
and NULL
input produce unclear errors and one cannot control how to handle them.jsontools
relies on jsonlite
for parsing from JSON and conversion to JSON. For parsing it has the following functions:
parse_json()
to parse a character vector of length 1 (i.e. a scalar).parse_json_vector()
to parse a character vector of any length.As an example assume we got the following JSON vector
library(jsontools)
<- json2(c(
customer_infos '{"name": "Peter", "age": 19, "premium": true}',
'{"name": "Daniel", "age": 41}',
NA,
'{"name": "Pablo", "age": 27, "premium": false}'
))
customer_infos#> {"name": "Peter", "age": 19, "premium": true}
#> {"name": "Daniel", "age": 41}
#> NA
#> {"name": "Pablo", "age": 27, "premium": false}
By default the NA
produces an error
parse_json_vector(customer_infos)
#> Error: input is NA.
#> To use a default value use the argument `.na`.
but with the .na
argument we can specify which value should be used instead:
parse_json_vector(customer_infos, .na = NULL) %>% str()
#> List of 4
#> $ :List of 3
#> ..$ name : chr "Peter"
#> ..$ age : int 19
#> ..$ premium: logi TRUE
#> $ :List of 2
#> ..$ name: chr "Daniel"
#> ..$ age : int 41
#> $ : NULL
#> $ :List of 3
#> ..$ name : chr "Pablo"
#> ..$ age : int 27
#> ..$ premium: logi FALSE
Note that the default for simplifyDataFrame
and simplifyMatrix
has been changed to FALSE
. While they can be quite nice for interactive parsing of JSON they do not allow you to control how the result should look like. For a more controlled conversion of the resulting list to a data frame have a look at the tibblify package
.
For conversion there are the following functions:
format_json()
: convert an R object to JSON.format_json_vector()
: convert each element of an R object to JSON and return them as a JSON vector.format_json_rowwise()
: convert each row of a data frame to JSON.write_json()
: convert an R object to JSON and write it to disk.Let’s look at our customer informations from above. Assume we have converted it into a tibble:
<- tibble::tibble(
customer_infos_df name = c("Peter", "Daniel", NA, "Pablo"),
age = c(19L, 41L, NA, 27L),
premium = c(TRUE, NA, NA, FALSE)
)
and filled the missing fields:
3, ] <- tibble::tibble(
customer_infos_df[name = "Michael",
age = 38,
premium = FALSE
)$premium[2] <- TRUE
customer_infos_df
customer_infos_df#> # A tibble: 4 x 3
#> name age premium
#> <chr> <int> <lgl>
#> 1 Peter 19 TRUE
#> 2 Daniel 41 TRUE
#> 3 Michael 38 FALSE
#> 4 Pablo 27 FALSE
We can now easily convert it back into a JSON vector
format_json_rowwise(customer_infos_df)
#> {"name":"Peter","age":19,"premium":true}
#> {"name":"Daniel","age":41,"premium":true}
#> {"name":"Michael","age":38,"premium":false}
#> {"name":"Pablo","age":27,"premium":false}
Compare this to format_json()
which turns the data frame into an array of objects:
format_json(customer_infos_df)
#> [{"name":"Peter","age":19,"premium":true},{"name":"Daniel","age":41,"premium":true},{"name":"Michael","age":38,"premium":false},{"name":"Pablo","age":27,"premium":false}]
To format every element of a list to JSON instead of the whole list at once use format_json_list()
:
<- parse_json_vector(customer_infos, .na = NULL)
customer_infos_list str(customer_infos_list)
#> List of 4
#> $ :List of 3
#> ..$ name : chr "Peter"
#> ..$ age : int 19
#> ..$ premium: logi TRUE
#> $ :List of 2
#> ..$ name: chr "Daniel"
#> ..$ age : int 41
#> $ : NULL
#> $ :List of 3
#> ..$ name : chr "Pablo"
#> ..$ age : int 27
#> ..$ premium: logi FALSE
format_json_list(customer_infos_list)
#> {"name":["Peter"],"age":[19],"premium":[true]}
#> {"name":["Daniel"],"age":[41]}
#> {}
#> {"name":["Pablo"],"age":[27],"premium":[false]}
Notice that each value is wrapped in a JSON array? If you do not want this you have three possibilities:
json_u()
on the elements you do not want in an array.auto_unbox = TRUE
to only wrap elements of length > 1 in an array.While auto_unbox = TRUE
is quite handy you should only use it in non-interactive scripts as otherwise the output JSON might change the format. For example we a value customers
in a list. Usually, customers is a vector of length > 1.
<- list(customers = 1:2)
x format_json(x, auto_unbox = TRUE)
#> {"customers":[1,2]}
But in some rare cases there is only a single customer
<- list(customers = 1)
x format_json(x, auto_unbox = TRUE)
#> {"customers":1}
and instead of an array we get a scalar integer at the key “customers”.