Rectangling is the art and craft of taking a deeply nested list (often sourced from wild caught JSON or XML) and taming it into a tidy data set of rows and columns. There are three functions from tidyr that are particularly useful for rectangling:
unnest_longer()
takes each element of a list-column and
makes a new row.unnest_wider()
takes each element of a list-column and
makes a new column.unnest_auto()
guesses whether you want
unnest_longer()
or unnest_wider()
.hoist()
is similar to unnest_wider()
but
only plucks out selected components, and can reach down multiple
levels.A very large number of data rectangling problems can be solved by
combining these functions with a splash of dplyr (largely eliminating
prior approaches that combined mutate()
with multiple
purrr::map()
s).
To illustrate these techniques, we’ll use the repurrrsive package, which provides a number deeply nested lists originally mostly captured from web APIs.
library(tidyr)
library(dplyr)
library(repurrrsive)
We’ll start with gh_users
, a list which contains
information about six GitHub users. To begin, we put the
gh_users
list into a data frame:
<- tibble(user = gh_users) users
This seems a bit counter-intuitive: why is the first step in making a list simpler to make it more complicated? But a data frame has a big advantage: it bundles together multiple vectors so that everything is tracked together in a single object.
Each user
is a named list, where each element represents
a column.
names(users$user[[1]])
#> [1] "login" "id" "avatar_url"
#> [4] "gravatar_id" "url" "html_url"
#> [7] "followers_url" "following_url" "gists_url"
#> [10] "starred_url" "subscriptions_url" "organizations_url"
#> [13] "repos_url" "events_url" "received_events_url"
#> [16] "type" "site_admin" "name"
#> [19] "company" "blog" "location"
#> [22] "email" "hireable" "bio"
#> [25] "public_repos" "public_gists" "followers"
#> [28] "following" "created_at" "updated_at"
There are two ways to turn the list components into columns.
unnest_wider()
takes every component and makes a new
column:
%>% unnest_wider(user)
users #> # A tibble: 6 × 30
#> login id avata…¹ grava…² url html_…³ follo…⁴ follo…⁵ gists…⁶ starr…⁷
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 gaborcsa… 6.60e5 https:… "" http… https:… https:… https:… https:… https:…
#> 2 jennybc 5.99e5 https:… "" http… https:… https:… https:… https:… https:…
#> 3 jtleek 1.57e6 https:… "" http… https:… https:… https:… https:… https:…
#> 4 juliasil… 1.25e7 https:… "" http… https:… https:… https:… https:… https:…
#> 5 leeper 3.51e6 https:… "" http… https:… https:… https:… https:… https:…
#> 6 masalmon 8.36e6 https:… "" http… https:… https:… https:… https:… https:…
#> # … with 20 more variables: subscriptions_url <chr>, organizations_url <chr>,
#> # repos_url <chr>, events_url <chr>, received_events_url <chr>, type <chr>,
#> # site_admin <lgl>, name <chr>, company <chr>, blog <chr>, location <chr>,
#> # email <chr>, hireable <lgl>, bio <chr>, public_repos <int>,
#> # public_gists <int>, followers <int>, following <int>, created_at <chr>,
#> # updated_at <chr>, and abbreviated variable names ¹avatar_url, ²gravatar_id,
#> # ³html_url, ⁴followers_url, ⁵following_url, ⁶gists_url, ⁷starred_url
But in this case, there are many components and we don’t need most of
them so we can instead use hoist()
. hoist()
allows us to pull out selected components using the same syntax as
purrr::pluck()
:
%>% hoist(user,
users followers = "followers",
login = "login",
url = "html_url"
)#> # A tibble: 6 × 4
#> followers login url user
#> <int> <chr> <chr> <list>
#> 1 303 gaborcsardi https://github.com/gaborcsardi <named list [27]>
#> 2 780 jennybc https://github.com/jennybc <named list [27]>
#> 3 3958 jtleek https://github.com/jtleek <named list [27]>
#> 4 115 juliasilge https://github.com/juliasilge <named list [27]>
#> 5 213 leeper https://github.com/leeper <named list [27]>
#> 6 34 masalmon https://github.com/masalmon <named list [27]>
hoist()
removes the named components from the
user
list-column, so you can think of it as moving
components out of the inner list into the top-level data frame.
We start off gh_repos
similarly, by putting it in a
tibble:
<- tibble(repo = gh_repos)
repos
repos#> # A tibble: 6 × 1
#> repo
#> <list>
#> 1 <list [30]>
#> 2 <list [30]>
#> 3 <list [30]>
#> 4 <list [26]>
#> 5 <list [30]>
#> 6 <list [30]>
This time the elements of user
are a list of
repositories that belong to that user. These are observations, so should
become new rows, so we use unnest_longer()
rather than
unnest_wider()
:
<- repos %>% unnest_longer(repo)
repos
repos#> # A tibble: 176 × 1
#> repo
#> <list>
#> 1 <named list [68]>
#> 2 <named list [68]>
#> 3 <named list [68]>
#> 4 <named list [68]>
#> 5 <named list [68]>
#> 6 <named list [68]>
#> # … with 170 more rows
Then we can use unnest_wider()
or
hoist()
:
%>% hoist(repo,
repos login = c("owner", "login"),
name = "name",
homepage = "homepage",
watchers = "watchers_count"
)#> # A tibble: 176 × 5
#> login name homepage watchers repo
#> <chr> <chr> <chr> <int> <list>
#> 1 gaborcsardi after <NA> 5 <named list [65]>
#> 2 gaborcsardi argufy <NA> 19 <named list [65]>
#> 3 gaborcsardi ask <NA> 5 <named list [65]>
#> 4 gaborcsardi baseimports <NA> 0 <named list [65]>
#> 5 gaborcsardi citest <NA> 0 <named list [65]>
#> 6 gaborcsardi clisymbols "" 18 <named list [65]>
#> # … with 170 more rows
Note the use of c("owner", "login")
: this allows us to
reach two levels deep inside of a list. An alternative approach would be
to pull out just owner
and then put each element of it in a
column:
%>%
repos hoist(repo, owner = "owner") %>%
unnest_wider(owner)
#> # A tibble: 176 × 18
#> login id avata…¹ grava…² url html_…³ follo…⁴ follo…⁵ gists…⁶ starr…⁷
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 2 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 3 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 4 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 5 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> 6 gaborcsa… 660288 https:… "" http… https:… https:… https:… https:… https:…
#> # … with 170 more rows, 8 more variables: subscriptions_url <chr>,
#> # organizations_url <chr>, repos_url <chr>, events_url <chr>,
#> # received_events_url <chr>, type <chr>, site_admin <lgl>, repo <list>, and
#> # abbreviated variable names ¹avatar_url, ²gravatar_id, ³html_url,
#> # ⁴followers_url, ⁵following_url, ⁶gists_url, ⁷starred_url
Instead of looking at the list and carefully thinking about whether
it needs to become rows or columns, you can use
unnest_auto()
. It uses a handful of heuristics to figure
out whether unnest_longer()
or unnest_wider()
is appropriate, and tells you about its reasoning.
tibble(repo = gh_repos) %>%
unnest_auto(repo) %>%
unnest_auto(repo)
#> Using `unnest_longer(repo, indices_include = FALSE)`; no element has names
#> Using `unnest_wider(repo)`; elements have 68 names in common
#> # A tibble: 176 × 68
#> id name full_…¹ owner private html_…² descr…³ fork url forks…⁴
#> <int> <chr> <chr> <list> <lgl> <chr> <chr> <lgl> <chr> <chr>
#> 1 6.12e7 after gaborc… <named list> FALSE https:… Run Co… FALSE http… https:…
#> 2 4.05e7 argu… gaborc… <named list> FALSE https:… Declar… FALSE http… https:…
#> 3 3.64e7 ask gaborc… <named list> FALSE https:… Friend… FALSE http… https:…
#> 4 3.49e7 base… gaborc… <named list> FALSE https:… Do we … FALSE http… https:…
#> 5 6.16e7 cite… gaborc… <named list> FALSE https:… Test R… TRUE http… https:…
#> 6 3.39e7 clis… gaborc… <named list> FALSE https:… Unicod… FALSE http… https:…
#> # … with 170 more rows, 58 more variables: keys_url <chr>,
#> # collaborators_url <chr>, teams_url <chr>, hooks_url <chr>,
#> # issue_events_url <chr>, events_url <chr>, assignees_url <chr>,
#> # branches_url <chr>, tags_url <chr>, blobs_url <chr>, git_tags_url <chr>,
#> # git_refs_url <chr>, trees_url <chr>, statuses_url <chr>,
#> # languages_url <chr>, stargazers_url <chr>, contributors_url <chr>,
#> # subscribers_url <chr>, subscription_url <chr>, commits_url <chr>, …
got_chars
has a similar structure to
gh_users
: it’s a list of named lists, where each element of
the inner list describes some attribute of a GoT character. We start in
the same way, first by creating a data frame and then by unnesting each
component into a column:
<- tibble(char = got_chars)
chars
chars#> # A tibble: 30 × 1
#> char
#> <list>
#> 1 <named list [18]>
#> 2 <named list [18]>
#> 3 <named list [18]>
#> 4 <named list [18]>
#> 5 <named list [18]>
#> 6 <named list [18]>
#> # … with 24 more rows
<- chars %>% unnest_wider(char)
chars2
chars2#> # A tibble: 30 × 18
#> url id name gender culture born died alive titles aliases father
#> <chr> <int> <chr> <chr> <chr> <chr> <chr> <lgl> <list> <list> <chr>
#> 1 https://ww… 1022 Theo… Male "Ironb… "In … "" TRUE <chr> <chr> ""
#> 2 https://ww… 1052 Tyri… Male "" "In … "" TRUE <chr> <chr> ""
#> 3 https://ww… 1074 Vict… Male "Ironb… "In … "" TRUE <chr> <chr> ""
#> 4 https://ww… 1109 Will Male "" "" "In … FALSE <chr> <chr> ""
#> 5 https://ww… 1166 Areo… Male "Norvo… "In … "" TRUE <chr> <chr> ""
#> 6 https://ww… 1267 Chett Male "" "At … "In … FALSE <chr> <chr> ""
#> # … with 24 more rows, and 7 more variables: mother <chr>, spouse <chr>,
#> # allegiances <list>, books <list>, povBooks <list>, tvSeries <list>,
#> # playedBy <list>
This is more complex than gh_users
because some
component of char
are themselves a list, giving us a
collection of list-columns:
%>% select_if(is.list)
chars2 #> # A tibble: 30 × 7
#> titles aliases allegiances books povBooks tvSeries playedBy
#> <list> <list> <list> <list> <list> <list> <list>
#> 1 <chr [3]> <chr [4]> <chr [1]> <chr [3]> <chr [2]> <chr [6]> <chr [1]>
#> 2 <chr [2]> <chr [11]> <chr [1]> <chr [2]> <chr [4]> <chr [6]> <chr [1]>
#> 3 <chr [2]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [1]> <chr [1]>
#> 4 <chr [1]> <chr [1]> <NULL> <chr [1]> <chr [1]> <chr [1]> <chr [1]>
#> 5 <chr [1]> <chr [1]> <chr [1]> <chr [3]> <chr [2]> <chr [2]> <chr [1]>
#> 6 <chr [1]> <chr [1]> <NULL> <chr [2]> <chr [1]> <chr [1]> <chr [1]>
#> # … with 24 more rows
What you do next will depend on the purposes of the analysis. Maybe you want a row for every book and TV series that the character appears in:
%>%
chars2 select(name, books, tvSeries) %>%
pivot_longer(c(books, tvSeries), names_to = "media", values_to = "value") %>%
unnest_longer(value)
#> # A tibble: 180 × 3
#> name media value
#> <chr> <chr> <chr>
#> 1 Theon Greyjoy books A Game of Thrones
#> 2 Theon Greyjoy books A Storm of Swords
#> 3 Theon Greyjoy books A Feast for Crows
#> 4 Theon Greyjoy tvSeries Season 1
#> 5 Theon Greyjoy tvSeries Season 2
#> 6 Theon Greyjoy tvSeries Season 3
#> # … with 174 more rows
Or maybe you want to build a table that lets you match title to name:
%>%
chars2 select(name, title = titles) %>%
unnest_longer(title)
#> # A tibble: 60 × 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> # … with 54 more rows
(Note that the empty titles (""
) are due to an
infelicity in the input got_chars
: ideally people without
titles would have a title vector of length 0, not a title vector of
length 1 containing an empty string.)
Again, we could rewrite using unnest_auto()
. This is
convenient for exploration, but I wouldn’t rely on it in the long term -
unnest_auto()
has the undesirable property that it will
always succeed. That means if your data structure changes,
unnest_auto()
will continue to work, but might give very
different output that causes cryptic failures from downstream
functions.
tibble(char = got_chars) %>%
unnest_auto(char) %>%
select(name, title = titles) %>%
unnest_auto(title)
#> Using `unnest_wider(char)`; elements have 18 names in common
#> Using `unnest_longer(title, indices_include = FALSE)`; no element has names
#> # A tibble: 60 × 2
#> name title
#> <chr> <chr>
#> 1 Theon Greyjoy Prince of Winterfell
#> 2 Theon Greyjoy Captain of Sea Bitch
#> 3 Theon Greyjoy Lord of the Iron Islands (by law of the green lands)
#> 4 Tyrion Lannister Acting Hand of the King (former)
#> 5 Tyrion Lannister Master of Coin (former)
#> 6 Victarion Greyjoy Lord Captain of the Iron Fleet
#> # … with 54 more rows
Next we’ll tackle a more complex form of data that comes from Google’s geocoding service. It’s against the terms of service to cache this data, so I first write a very simple wrapper around the API. This relies on having an Google maps API key stored in an environment; if that’s not available these code chunks won’t be run.
<- !identical(Sys.getenv("GOOGLE_MAPS_API_KEY"), "")
has_key if (!has_key) {
message("No Google Maps API key found; code chunks will not be run")
}
# https://developers.google.com/maps/documentation/geocoding
<- function(address, api_key = Sys.getenv("GOOGLE_MAPS_API_KEY")) {
geocode <- "https://maps.googleapis.com/maps/api/geocode/json"
url <- paste0(url, "?address=", URLencode(address), "&key=", api_key)
url
::read_json(url)
jsonlite }
The list that this function returns is quite complex:
<- geocode("Houston TX")
houston str(houston)
#> List of 2
#> $ results:List of 1
#> ..$ :List of 5
#> .. ..$ address_components:List of 4
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "Houston"
#> .. .. .. ..$ short_name: chr "Houston"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "locality"
#> .. .. .. .. ..$ : chr "political"
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "Harris County"
#> .. .. .. ..$ short_name: chr "Harris County"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "administrative_area_level_2"
#> .. .. .. .. ..$ : chr "political"
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "Texas"
#> .. .. .. ..$ short_name: chr "TX"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "administrative_area_level_1"
#> .. .. .. .. ..$ : chr "political"
#> .. .. ..$ :List of 3
#> .. .. .. ..$ long_name : chr "United States"
#> .. .. .. ..$ short_name: chr "US"
#> .. .. .. ..$ types :List of 2
#> .. .. .. .. ..$ : chr "country"
#> .. .. .. .. ..$ : chr "political"
#> .. ..$ formatted_address : chr "Houston, TX, USA"
#> .. ..$ geometry :List of 4
#> .. .. ..$ bounds :List of 2
#> .. .. .. ..$ northeast:List of 2
#> .. .. .. .. ..$ lat: num 30.1
#> .. .. .. .. ..$ lng: num -95
#> .. .. .. ..$ southwest:List of 2
#> .. .. .. .. ..$ lat: num 29.5
#> .. .. .. .. ..$ lng: num -95.8
#> .. .. ..$ location :List of 2
#> .. .. .. ..$ lat: num 29.8
#> .. .. .. ..$ lng: num -95.4
#> .. .. ..$ location_type: chr "APPROXIMATE"
#> .. .. ..$ viewport :List of 2
#> .. .. .. ..$ northeast:List of 2
#> .. .. .. .. ..$ lat: num 30.1
#> .. .. .. .. ..$ lng: num -95
#> .. .. .. ..$ southwest:List of 2
#> .. .. .. .. ..$ lat: num 29.5
#> .. .. .. .. ..$ lng: num -95.8
#> .. ..$ place_id : chr "ChIJAYWNSLS4QIYROwVl894CDco"
#> .. ..$ types :List of 2
#> .. .. ..$ : chr "locality"
#> .. .. ..$ : chr "political"
#> $ status : chr "OK"
Fortunately, we can attack the problem step by step with tidyr functions. To make the problem a bit harder (!) and more realistic, I’ll start by geocoding a few cities:
<- c("Houston", "LA", "New York", "Chicago", "Springfield")
city <- purrr::map(city, geocode) city_geo
I’ll put these results in a tibble, next to the original city name:
<- tibble(city = city, json = city_geo)
loc
loc#> # A tibble: 5 × 2
#> city json
#> <chr> <list>
#> 1 Houston <named list [2]>
#> 2 LA <named list [2]>
#> 3 New York <named list [2]>
#> 4 Chicago <named list [2]>
#> 5 Springfield <named list [2]>
The first level contains components status
and
result
, which we can reveal with
unnest_wider()
:
%>%
loc unnest_wider(json)
#> # A tibble: 5 × 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <list [1]> OK
#> 2 LA <list [1]> OK
#> 3 New York <list [1]> OK
#> 4 Chicago <list [1]> OK
#> 5 Springfield <list [1]> OK
Notice that results
is a list of lists. Most of the
cities have 1 element (representing a unique match from the geocoding
API), but Springfield has two. We can pull these out into separate rows
with unnest_longer()
:
%>%
loc unnest_wider(json) %>%
unnest_longer(results)
#> # A tibble: 5 × 3
#> city results status
#> <chr> <list> <chr>
#> 1 Houston <named list [5]> OK
#> 2 LA <named list [5]> OK
#> 3 New York <named list [5]> OK
#> 4 Chicago <named list [5]> OK
#> 5 Springfield <named list [7]> OK
Now these all have the same components, as revealed by
unnest_wider()
:
%>%
loc unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results)
#> # A tibble: 5 × 9
#> city addre…¹ forma…² geometry place…³ types parti…⁴ plus_code status
#> <chr> <list> <chr> <list> <chr> <list> <lgl> <list> <chr>
#> 1 Houst… <list> Housto… <named list> ChIJAY… <list> NA <NULL> OK
#> 2 LA <list> Los An… <named list> ChIJE9… <list> NA <NULL> OK
#> 3 New Y… <list> New Yo… <named list> ChIJOw… <list> NA <NULL> OK
#> 4 Chica… <list> Chicag… <named list> ChIJ7c… <list> NA <NULL> OK
#> 5 Sprin… <list> Spring… <named list> ChIJ-x… <list> TRUE <named list> OK
#> # … with abbreviated variable names ¹address_components, ²formatted_address,
#> # ³place_id, ⁴partial_match
We can find the lat and lon coordinates by unnesting
geometry
:
%>%
loc unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry)
#> # A tibble: 5 × 12
#> city addre…¹ forma…² bounds location locat…³ viewport place…⁴
#> <chr> <list> <chr> <list> <list> <chr> <list> <chr>
#> 1 Houston <list> Housto… <named list> <named list> APPROX… <named list> ChIJAY…
#> 2 LA <list> Los An… <named list> <named list> APPROX… <named list> ChIJE9…
#> 3 New Yo… <list> New Yo… <named list> <named list> APPROX… <named list> ChIJOw…
#> 4 Chicago <list> Chicag… <named list> <named list> APPROX… <named list> ChIJ7c…
#> 5 Spring… <list> Spring… <NULL> <named list> ROOFTOP <named list> ChIJ-x…
#> # … with 4 more variables: types <list>, partial_match <lgl>, plus_code <list>,
#> # status <chr>, and abbreviated variable names ¹address_components,
#> # ²formatted_address, ³location_type, ⁴place_id
And then location:
%>%
loc unnest_wider(json) %>%
unnest_longer(results) %>%
unnest_wider(results) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 × 13
#> city addre…¹ forma…² bounds lat lng locat…³ viewport place…⁴
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr> <list> <chr>
#> 1 Houston <list> Housto… <named list> 29.8 -95.4 APPROX… <named list> ChIJAY…
#> 2 LA <list> Los An… <named list> 34.1 -118. APPROX… <named list> ChIJE9…
#> 3 New Yo… <list> New Yo… <named list> 40.7 -74.0 APPROX… <named list> ChIJOw…
#> 4 Chicago <list> Chicag… <named list> 41.9 -87.6 APPROX… <named list> ChIJ7c…
#> 5 Spring… <list> Spring… <NULL> 42.1 -72.6 ROOFTOP <named list> ChIJ-x…
#> # … with 4 more variables: types <list>, partial_match <lgl>, plus_code <list>,
#> # status <chr>, and abbreviated variable names ¹address_components,
#> # ²formatted_address, ³location_type, ⁴place_id
Again, unnest_auto()
makes this simpler with the small
risk of failing in unexpected ways if the input structure changes:
%>%
loc unnest_auto(json) %>%
unnest_auto(results) %>%
unnest_auto(results) %>%
unnest_auto(geometry) %>%
unnest_auto(location)
#> Using `unnest_wider(json)`; elements have 2 names in common
#> Using `unnest_longer(results, indices_include = FALSE)`; no element has names
#> Using `unnest_wider(results)`; elements have 5 names in common
#> Using `unnest_wider(geometry)`; elements have 3 names in common
#> Using `unnest_wider(location)`; elements have 2 names in common
#> # A tibble: 5 × 13
#> city addre…¹ forma…² bounds lat lng locat…³ viewport place…⁴
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr> <list> <chr>
#> 1 Houston <list> Housto… <named list> 29.8 -95.4 APPROX… <named list> ChIJAY…
#> 2 LA <list> Los An… <named list> 34.1 -118. APPROX… <named list> ChIJE9…
#> 3 New Yo… <list> New Yo… <named list> 40.7 -74.0 APPROX… <named list> ChIJOw…
#> 4 Chicago <list> Chicag… <named list> 41.9 -87.6 APPROX… <named list> ChIJ7c…
#> 5 Spring… <list> Spring… <NULL> 42.1 -72.6 ROOFTOP <named list> ChIJ-x…
#> # … with 4 more variables: types <list>, partial_match <lgl>, plus_code <list>,
#> # status <chr>, and abbreviated variable names ¹address_components,
#> # ²formatted_address, ³location_type, ⁴place_id
We could also just look at the first address for each city:
%>%
loc unnest_wider(json) %>%
hoist(results, first_result = 1) %>%
unnest_wider(first_result) %>%
unnest_wider(geometry) %>%
unnest_wider(location)
#> # A tibble: 5 × 13
#> city addre…¹ forma…² bounds lat lng locat…³ viewport place…⁴
#> <chr> <list> <chr> <list> <dbl> <dbl> <chr> <list> <chr>
#> 1 Houston <list> Housto… <named list> 29.8 -95.4 APPROX… <named list> ChIJAY…
#> 2 LA <list> Los An… <named list> 34.1 -118. APPROX… <named list> ChIJE9…
#> 3 New Yo… <list> New Yo… <named list> 40.7 -74.0 APPROX… <named list> ChIJOw…
#> 4 Chicago <list> Chicag… <named list> 41.9 -87.6 APPROX… <named list> ChIJ7c…
#> 5 Spring… <list> Spring… <NULL> 42.1 -72.6 ROOFTOP <named list> ChIJ-x…
#> # … with 4 more variables: types <list>, partial_match <lgl>, plus_code <list>,
#> # status <chr>, and abbreviated variable names ¹address_components,
#> # ²formatted_address, ³location_type, ⁴place_id
Or use hoist()
to dive deeply to get directly to
lat
and lng
:
%>%
loc hoist(json,
lat = list("results", 1, "geometry", "location", "lat"),
lng = list("results", 1, "geometry", "location", "lng")
)#> # A tibble: 5 × 4
#> city lat lng json
#> <chr> <dbl> <dbl> <list>
#> 1 Houston 29.8 -95.4 <named list [2]>
#> 2 LA 34.1 -118. <named list [2]>
#> 3 New York 40.7 -74.0 <named list [2]>
#> 4 Chicago 41.9 -87.6 <named list [2]>
#> 5 Springfield 42.1 -72.6 <named list [2]>
I’d normally use readr::parse_datetime()
or
lubridate::ymd_hms()
, but I can’t here because it’s a
vignette and I don’t want to add a dependency to tidyr just to simplify
one example.↩︎