Flight is a general-purpose client-server framework for high performance transport of large datasets over network interfaces, built as part of the Apache Arrow project.
Flight allows for highly efficient data transfer as it:
The arrow package provides methods for connecting to Flight RPC servers to send and receive data.
The flight
functions in the package use reticulate to call
methods in the pyarrow
Python package.
Before using them for the first time, you’ll need to be sure you have reticulate and pyarrow installed:
install.packages("reticulate")
::install_pyarrow() arrow
See vignette("python", package = "arrow")
for more
details on setting up pyarrow
.
The package includes methods for starting a Python-based Flight server, as well as methods for connecting to a Flight server running elsewhere.
To illustrate both sides, in one process let’s start a demo server:
library(arrow)
<- load_flight_server("demo_flight_server")
demo_server <- demo_server$DemoFlightServer(port = 8089)
server $serve() server
We’ll leave that one running.
In a different R process, let’s connect to it and put some data in it.
library(arrow)
<- flight_connect(port = 8089)
client # Upload some data to our server so there's something to demo
flight_put(client, iris, path = "test_data/iris")
Now, in a new R process, let’s connect to the server and pull the data we put there:
library(arrow)
library(dplyr)
<- flight_connect(port = 8089)
client %>%
client flight_get("test_data/iris") %>%
group_by(Species) %>%
summarize(max_petal = max(Petal.Length))
## # A tibble: 3 x 2
## Species max_petal
## <fct> <dbl>
## 1 setosa 1.9
## 2 versicolor 5.1
## 3 virginica 6.9
Because flight_get()
returns an Arrow data structure,
you can directly pipe its result into a dplyr workflow. See
vignette("dataset", package = "arrow")
for more information
on working with Arrow objects via a dplyr interface.