The goal of this document is show you the basics of httr2. You’ll learn how to create and submit HTTP requests and work with the HTTP responses that you get back. httr2 is designed to map closely to the underlying HTTP protocol, which I’ll explain as we go along. For more details, I also recommend “An overview of HTTP” from MDN.
library(httr2)
In httr2, you start by creating a request. If you’re familiar with httr, this a big change: with httr you could only submit a request, immediately receiving a response. Having an explicit request object makes it easier to build up a complex request piece by piece and works well with the pipe.
Every request starts with a URL:
<- request("https://httpbin.org/get")
req
req#> <httr2_request>
#> GET https://httpbin.org/get
#> Body: empty
We can see exactly what this request will send to the server with a dry run:
%>% req_dry_run()
req #> GET /get HTTP/1.1
#> Host: httpbin.org
#> User-Agent: httr2/0.2.1 r-curl/4.3.2 libcurl/7.79.1
#> Accept: */*
#> Accept-Encoding: deflate, gzip
The first line of the request contains three important pieces of information:
The HTTP method, which is a verb that tells the server what you want to do. Here’s its GET, the most common verb, indicating that we want to get a resource. Other verbs include POST, to create a new resource, PUT, to replace an existing resource, and DELETE, to delete a resource.
The path, which is the URL stripped of details
that the server already knows, i.e. the protocol (http
or
https
), the host (httpbin.org
), and the port
(not used here).
The version of the HTTP protocol. This is unimportant for our purposes because it’s handled at a lower level.
The following lines specify the HTTP headers, a
series of name-value pairs separated by :
. The headers in
this request were automatically added by httr2, but you can override
them or add your own with req_headers()
:
%>%
req req_headers(
Name = "Hadley",
`Shoe-Size` = "11",
Accept = "application/json"
%>%
) req_dry_run()
#> GET /get HTTP/1.1
#> Host: httpbin.org
#> User-Agent: httr2/0.2.1 r-curl/4.3.2 libcurl/7.79.1
#> Accept-Encoding: deflate, gzip
#> Name: Hadley
#> Shoe-Size: 11
#> Accept: application/json
Header names are case-insensitive, and servers will ignore headers that they don’t understand.
The headers finish with a blank line which is followed by the
body. The requests above (like all GET requests) don’t
have a body, so let’s add one to see what happens. The
req_body_*()
functions provide a variety of ways to add
data to the body. Here we’ll use req_body_json()
to add
some data encoded as JSON:
%>%
req req_body_json(list(x = 1, y = "a")) %>%
req_dry_run()
#> POST /get HTTP/1.1
#> Host: httpbin.org
#> User-Agent: httr2/0.2.1 r-curl/4.3.2 libcurl/7.79.1
#> Accept: */*
#> Accept-Encoding: deflate, gzip
#> Content-Type: application/json
#> Content-Length: 15
#>
#> {"x":1,"y":"a"}
What’s changed?
The method has changed from GET to POST. POST is the standard
method for sending data to a website, and is automatically used whenever
you add a body. Use req_method()
to for a different
method.
There are two new headers: Content-Type
and
Content-Length
. They tell the server how to interpret the
body — it’s encoded as JSON and is 15 bytes long.
We have a body, consisting of some JSON.
Different servers want data encoded differently so httr2 provides a
selection of common formats. For example, req_body_form()
uses the encoding used when you submit a form from a web browser:
%>%
req req_body_form(x = "1", y = "a") %>%
req_dry_run()
#> POST /get HTTP/1.1
#> Host: httpbin.org
#> User-Agent: httr2/0.2.1 r-curl/4.3.2 libcurl/7.79.1
#> Accept: */*
#> Accept-Encoding: deflate, gzip
#> Content-Type: application/x-www-form-urlencoded
#> Content-Length: 7
#>
#> x=1&y=a
And req_body_multipart()
uses the multipart encoding
which is particularly important when you need to send larger amounts of
data or complete files:
%>%
req req_body_multipart(x = "1", y = "a") %>%
req_dry_run()
#> POST /get HTTP/1.1
#> Host: httpbin.org
#> User-Agent: httr2/0.2.1 r-curl/4.3.2 libcurl/7.79.1
#> Accept: */*
#> Accept-Encoding: deflate, gzip
#> Content-Length: 228
#> Content-Type: multipart/form-data; boundary=------------------------e9ed6552328d3847
#>
#> --------------------------e9ed6552328d3847
#> Content-Disposition: form-data; name="x"
#>
#> 1
#> --------------------------e9ed6552328d3847
#> Content-Disposition: form-data; name="y"
#>
#> a
#> --------------------------e9ed6552328d3847--
If you need to send data encoded in a different form, you can use
req_body_raw()
to add the data to the body and set the
Content-Type
header.
To actually perform a request and fetch the response back from the
server, call req_perform()
:
<- request("https://httpbin.org/json")
req <- req %>% req_perform()
resp
resp#> <httr2_response>
#> GET https://httpbin.org/json
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (429 bytes)
You can see a simulation of what httr2 actually received with
resp_raw()
:
%>% resp_raw()
resp #> HTTP/1.1 200 OK
#> date: Tue, 10 May 2022 19:08:52 GMT
#> content-type: application/json
#> content-length: 429
#> server: gunicorn/19.9.0
#> access-control-allow-origin: *
#> access-control-allow-credentials: true
#>
#> {
#> "slideshow": {
#> "author": "Yours Truly",
#> "date": "date of publication",
#> "slides": [
#> {
#> "title": "Wake up to WonderWidgets!",
#> "type": "all"
#> },
#> {
#> "items": [
#> "Why <em>WonderWidgets</em> are great",
#> "Who <em>buys</em> WonderWidgets"
#> ],
#> "title": "Overview",
#> "type": "all"
#> }
#> ],
#> "title": "Sample Slide Show"
#> }
#> }
An HTTP response has a very similar structure to an HTTP request. The first line gives the version of HTTP used, and a status code that’s optionally followed by a short description. Then we have the headers, followed by a blank line, followed by a body. The majority of responses will have a body, unlike requests.
You can extract data from the response using the resp_()
functions:
resp_status()
returns the status code and
resp_status_desc()
returns the description:
%>% resp_status()
resp #> [1] 200
%>% resp_status_desc()
resp #> [1] "OK"
You can extract all headers with resp_headers()
or a
specific header with resp_header()
:
%>% resp_headers()
resp #> <httr2_headers>
#> date: Tue, 10 May 2022 19:08:52 GMT
#> content-type: application/json
#> content-length: 429
#> server: gunicorn/19.9.0
#> access-control-allow-origin: *
#> access-control-allow-credentials: true
%>% resp_header("Content-Length")
resp #> [1] "429"
Headers are case insensitive:
%>% resp_header("ConTEnT-LeNgTH")
resp #> [1] "429"
You can extract the body in various forms using the
resp_body_*()
family of functions. Since this response
returns JSON we can use resp_body_json()
:
%>% resp_body_json() %>% str()
resp #> List of 1
#> $ slideshow:List of 4
#> ..$ author: chr "Yours Truly"
#> ..$ date : chr "date of publication"
#> ..$ slides:List of 2
#> .. ..$ :List of 2
#> .. .. ..$ title: chr "Wake up to WonderWidgets!"
#> .. .. ..$ type : chr "all"
#> .. ..$ :List of 3
#> .. .. ..$ items:List of 2
#> .. .. .. ..$ : chr "Why <em>WonderWidgets</em> are great"
#> .. .. .. ..$ : chr "Who <em>buys</em> WonderWidgets"
#> .. .. ..$ title: chr "Overview"
#> .. .. ..$ type : chr "all"
#> ..$ title : chr "Sample Slide Show"
Responses with status codes 4xx and 5xx are HTTP errors. httr2 automatically turns these into R errors:
request("https://httpbin.org/status/404") %>% req_perform()
#> Error in `resp_abort()`:
#> ! HTTP 404 Not Found.
request("https://httpbin.org/status/500") %>% req_perform()
#> Error in `resp_abort()`:
#> ! HTTP 500 Internal Server Error.
This is another important difference to httr, which required that you
explicitly call httr::stop_for_status()
to turn HTTP errors
into R errors. You can revert to the httr behaviour with
req_error(req, is_error = ~ FALSE)
.
A number of req_
functions don’t directly affect the
HTTP request but instead control the overall process of submitting a
request and handling the response. These include:
req_cache()
sets up a cache so if repeated requests
return the same results, you can avoid a trip to the server.
req_throttle()
will automatically add a small delay
before each request so you can avoid hammering a server with many
requests.
req_retry()
sets up a retry strategy so that if the
request either fails or you get a transient HTTP error, it’ll
automatically retry after a short delay.
For more details see their documentation, as well as examples of the
usage in real APIs in vignette("wrapping-apis.Rmd")
.