The goal of vetiver is to provide fluent tooling for MLOps tasks for your trained model including:
vetiver_model()
The vetiver package is extensible, with generics that can support many kinds of models. For this example, let’s consider one kind of model supported by vetiver, a tidymodels workflow that encompasses both feature engineering and model estimation.
library(parsnip)
library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
library(workflows)
data(bivariate, package = "modeldata")
bivariate_train#> # A tibble: 1,009 × 3
#> A B Class
#> <dbl> <dbl> <fct>
#> 1 3279. 155. One
#> 2 1727. 84.6 Two
#> 3 1195. 101. One
#> 4 1027. 68.7 Two
#> 5 1036. 73.4 One
#> 6 1434. 79.5 One
#> 7 633. 67.4 One
#> 8 1262. 67.0 Two
#> 9 985. 62.0 Two
#> 10 893. 56.8 Two
#> # … with 999 more rows
#> # ℹ Use `print(n = ...)` to see more rows
<-
biv_rec recipe(Class ~ ., data = bivariate_train) %>%
step_BoxCox(all_predictors())%>%
step_normalize(all_predictors())
<-
svm_spec svm_linear(mode = "classification") %>%
set_engine("LiblineaR")
<-
svm_fit workflow(biv_rec, svm_spec) %>%
fit(sample_frac(bivariate_train, 0.7))
This svm_fit
object is a fitted workflow, with both
feature engineering and model parameters estimated using the training
data bivariate_train
. We can create a
vetiver_model()
from this trained model; a
vetiver_model()
collects the information needed to store,
version, and deploy a trained model.
library(vetiver)
<- vetiver_model(svm_fit, "biv_svm")
v
v#>
#> ── biv_svm ─ <butchered_workflow> model for deployment
#> A LiblineaR classification modeling workflow using 2 features
Think of this vetiver_model()
as a deployable model
object.
You can store and version your model by choosing a pins “board” for it, including a
local folder, RStudio Connect, Amazon S3, and more. Most pins boards
have versioning turned on by default, but we can turn it on explicitly
for our temporary demo board. When we write the
vetiver_model()
to our board, the binary model object is
stored on our board together with necessary metadata, like the packages
needed to make a prediction and the model’s input data prototype for
checking new data at prediction time.
library(pins)
<- board_temp(versioned = TRUE)
model_board %>% vetiver_pin_write(v) model_board
Let’s train our model again with a new version of the dataset and write it once more to our board.
<-
svm_fit workflow(biv_rec, svm_spec) %>%
fit(sample_frac(bivariate_train, 0.7))
<- vetiver_model(svm_fit, "biv_svm")
v
%>% vetiver_pin_write(v) model_board
Both versions are stored, and we have access to both.
%>% pin_versions("biv_svm")
model_board #> # A tibble: 2 × 3
#> version created hash
#> <chr> <dttm> <chr>
#> 1 20220811T173416Z-2128f 2022-08-11 11:34:00 2128f
#> 2 20220811T173416Z-b8121 2022-08-11 11:34:00 b8121
The primary purpose of pins is to make it easy to share data
artifacts, so depending on the board you choose, your pinned
vetiver_model()
can be shareable with your
collaborators.
You can deploy your model by creating a Plumber router, and adding a POST endpoint for making predictions.
library(plumber)
pr() %>%
vetiver_api(v)
#> # Plumber router with 2 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/logo
#> │ │ # Plumber static router serving from directory: /private/var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpEUeJPS/Rinsta6082af3f41/vetiver
#> ├──/ping (GET)
#> └──/predict (POST)
To start a server using this object, pipe (%>%
) to
pr_run(port = 8088)
or your port of choice. This allows you
to interact with your vetiver API locally and debug it. Plumber APIs
such as these can be hosted in a variety
of ways. You can use the function
vetiver_write_plumber()
to create a ready-to-go
plumber.R
file that is especially suited for RStudio
Connect.
vetiver_write_plumber(model_board, "biv_svm")
# Generated by the vetiver package; edit with care
library(pins)
library(plumber)
library(rapidoc)
library(vetiver)
# Packages needed to generate model predictions
if (FALSE) {
library(LiblineaR)
library(parsnip)
library(recipes)
library(workflows)
}
b <- board_folder(path = "/var/folders/hv/hzsmmyk9393_m7q3nscx1slc0000gn/T/RtmpuwJy2h/pins-a634208eedac")
v <- vetiver_pin_read(b, "biv_svm", version = "20220811T173416Z-2128f")
#* @plumber
function(pr) {
pr %>% vetiver_api(v)
}
In a real-world situation, you would see something like
b <- board_rsconnect()
or
b <- board_s3()
here instead of our temporary demo
board. Notice that the deployment is strongly linked to a specific
version of the pinned model; if you pin another version of the
model after you deploy your model, your deployed model will not be
affected.
A model deployed via vetiver can be treated as a special
vetiver_endpoint()
object.
library(vetiver)
<- vetiver_endpoint("http://127.0.0.1:8088/predict")
endpoint
endpoint#>
#> ── A model API endpoint for prediction:
#> http://127.0.0.1:8088/predict
If such a deployed model endpoint is running via one R process (either remotely on a server or locally, perhaps via a background job in the RStudio IDE), you can make predictions with that deployed model and new data in another, separate R process.
data(bivariate, package = "modeldata")
predict(endpoint, bivariate_test)
#> # A tibble: 710 × 1
#> .pred_class
#> <chr>
#> 1 One
#> 2 Two
#> 3 One
#> 4 Two
#> 5 Two
#> 6 One
#> 7 Two
#> 8 Two
#> 9 Two
#> 10 One
#> # … with 700 more rows
Being able to predict()
on a vetiver model endpoint
takes advantage of the model’s input data prototype and other metadata
that is stored with the model.