Productionizing with Civis Scripts

Patrick Miller

2020-06-22

Civis Scripts are the way to productionize your code with Civis Platform. You’ve probably used three of the four types of scripts already in the Civis Platform UI (“Code” –> “Scripts”): language (R, Python3, javascript, and sql), container, and custom. If you’ve run any of these scripts in Civis Platform, you’ve already started productionizing your code. Most loosely, productionizing means that your code now runs on a remote server instead of your local or development machine.

You probably already know some of the benefits too:

  1. Easily schedule and automate tasks, and include tasks in workflows.
  2. Ensure your code doesn’t break in the future when dependencies change.
  3. Share code with others without them worrying about dependencies or language compatibility.
  4. Rapidly deploy fixes and changes.

This guide will cover how to programmatically do the same tasks using the API that you are used to doing in GUI. Instead of typing in values for the parameters or clicking to download outputs, you can do the same thing in your programs. Hooray for automation!

Specifically, this guide will cover how to programmatically read outputs, kick off new script runs, and publish your own script templates to share your code with others. It will make heavy use of API functions directly, but highlight convenient wrappers for common tasks where they have been implemented already.

Ready? Buckle in!

Script Concepts and Overview

A script is a job that executes code in Civis Platform. A script accepts user input through parameters, gives values back to the user as run outputs, and records any logs along the way.

A script author can share language and container scripts with others by letting users clone the script. But if an author makes a change to the script such as fixing a bug or adding a feature, users will have to re-clone the script to get access to those changes.

A better way to share code with others is with template scripts. A template script is a ‘published’ language or container script. The script that the template runs is the backing script of the template.

Once a container or language script is published as a template, users can create their own instances of the template. These instances are called custom scripts and they inherit all changes made to the template. This feature makes it easy to share code with others and to rapidly deploy changes and fixes.

Quick Start

# create a container script with a parameter
script <- scripts_post_containers(
  required_resources = list(cpu = 1024, memory = 50, diskSpace = 15),
  docker_command = 'cd /package_dir && Rscript inst/run_script.R',
  docker_image_name = 'civisanalytics/datascience-r',
  name = 'SCRIPT NAME',
  params = list(
    list(name = 'NAME_OF_ENV_VAR',
         label = 'Name User Sees', 
         type = 'string',
         required = TRUE)
  )
)

# publish the container script as a template 
template <- templates_post_scripts(script$id, name = 'TEMPLATE NAME', note = 'Markdown Docs')

# run a template script, returning file ids of run outputs
out <- run_template(template$id)

# post a file or JSONValue run output within a script
write_job_output('filename.csv')
json_values_post(jsonlite::toJSON(my_list), 'my_list.json')

# get run output file ids of a script
out <- fetch_output_file_ids(civis_script(id))

# get csv run outputs of a script
df <- read_civis(civis_script(id), regex = '.csv', using = read.csv)

# get JSONValue run outputs
my_list <- read_civis(civis_script(id))

Creating and Running Scripts

Let’s make these concepts concrete with an example! We’ll use the ‘R’ language script throughout, but container scripts work exactly the same way. In the second section, we’ll cover custom and template scripts.

An Example Script

The post method creates the job and returns a list of metadata about it, including its type.

source <- c('
 print("Hello World!")
')
job <- scripts_post_r(name = 'Hello!', source = source)

Each script can be uniquely identified by its job id. If you have a job id but don’t know what kind of script it is, you can do jobs_get(id).

Each script type is associated with its own API endpoints. For instance, to post a job of each script type, you need scripts_post_r, scripts_post_containers, scripts_post_custom, or templates_post_scripts.

This job hasn’t been run yet. To kick off a run do:

run <- scripts_post_r_runs(job$id)

# check the status
scripts_get_r_runs(job$id, run$id)

# automatically poll until the job completes
await(scripts_get_r_runs, id = job$id, run_id = run$id)

Since kicking off a job and polling until it completes is a really common task for this guide, let’s make it a function:

run_script <- function(source, name = 'Cool') {
  job <- scripts_post_r(name = name, source = source)
  run <- scripts_post_r_runs(job$id)
  await(scripts_get_r_runs, id = job$id, run_id = run$id)
}

Run Outputs

This script isn’t very useful because it doesn’t produce any output that we can access. To add an output to a job, we can use scripts_post_r_runs_outputs. The two most common types of run outputs are Files and JSONValues.

Files

We can specify adding a File as a run output by uploading the object to S3 with write_civis_file and setting object_type in scripts_post_r_runs_outputs to File. Notice that the environment variables CIVIS_JOB_ID and CIVIS_RUN_ID are automatically inserted into the environment for us to have access to.

source <- c("
 library(civis)
 data(iris)
 write.csv(iris, 'iris.csv')
 job_id <- as.numeric(Sys.getenv('CIVIS_JOB_ID'))
 run_id <- as.numeric(Sys.getenv('CIVIS_RUN_ID'))
 file_id <- write_civis_file('iris.csv')
 scripts_post_r_runs_outputs(job_id, run_id, object_type = 'File', object_id = file_id)
")
run <- run_script(source)

Since this pattern is so common, we replaced it with the function write_job_output which you can use to post a filename as a run output for any script type.

source <- c("
 library(civis)
 data(iris)
 write.csv(iris, 'iris.csv')
 write_job_output('iris.csv')
")
run <- run_script(source)

JSONValues

It is best practice to make run outputs as portable as possible because the script can be called by any language. For arbitrary data, JSONValues are often the best choice. Regardless, it is user friendly to add the file extension to the name of the run output.

Adding JSONValue run outputs is common enough for it to be implemented directly as a Civis API endpoint, json_values_post:

source <- c("
 library(civis)
 library(jsonlite)
 my_farm <- list(cows = 1, ducks = list(mallard = 2, goldeneye = 1))
 json_values_post(jsonlite::toJSON(my_farm), name = 'my_farm.json')
")
run_farm <- run_script(source)

To retrieve script outputs we can use scripts_list_r_runs_outputs:

out <- scripts_list_r_runs_outputs(run$rId, run$id)
iris <- read_civis(out$objectId, using = read.csv)

Since this pattern is also common, you can simply use read_civis directly. This will work for any script type. Use regex and using to filter run outputs by file extension, and provide the appropriate reading function. JSONValues can be read automatically.

# get csv run outputs
iris <- read_civis(civis_script(run$rId), regex = '.csv', using = read.csv)

# get JSONValues
my_farm <- read_civis(civis_script(run_farm$rId))

Script Parameters

Scripts are more useful if their behavior can be configured by the user, which can be done with script parameters. Script parameters are placeholders for input by the user. Specific values of the parameters input by the user are called arguments. Here, we modify run_script to automatically add a parameter, and simultaneously take a value of that parameter provided by the user. In the script itself, we can access the parameter as an environment variable.

# Add 'params' and 'arguments' to run_script
run_script <- function(source, args, name = 'Cool') {
  params <- list(          # params is a list of individual parameters
    list(
      name = 'PET_NAME',   # name of the environment variable with the user value
      label = 'Pet Name',  # name displaayed to the user
      type = 'string',     # type 
      required = TRUE      # required?
    )
  )
  job <- scripts_post_r(name = name, 
                        source = source, 
                        params = params, 
                        arguments = args)
  run <- scripts_post_r_runs(job$id)
  await(scripts_get_r_runs, id = job$id, run_id = run$id)
}

# Access the PET_NAME variable
source <- c('
  library(civis)
  pet_name <- Sys.getenv("PET_NAME")
  msg <- paste0("Hello", pet_name, "!")
  print(msg)
')

# Let's run it! Here we pass the argument 'Fitzgerald' to the 
# parameter 'PET_NAME' that we created.
run_script(source, name = 'Pet Greeting', args = list(PET_NAME = 'Fitzgerald'))

Sharing Scripts with Templates

Now we have a script. How can we share it with others so that they can use it? The best way to share scripts is with templates. Let’s start by simply posting the script above:

params <- list(          
  list(
    name = 'PET_NAME',   
    label = 'Pet Name',  
    type = 'string',     
    required = TRUE      
  )
)
job <- scripts_post_r(name = 'Pet Greeter', 
                      source = source, 
                      params = params)

To make this job a template use templates_post_scripts. Adding a notes field (markdown format) describing what the script does, what the parameters are, and what outputs it posts is often helpful for users.

note <- c("
# Pet Greeter

Greets your pet, given its name! 
 
For your pet to receive the greeting, it must be a Civis Platform
user with the ability to read.
 
Parameters:
  * Pet Name: string, Name of pet.

  
Returns:
  * Nothing
")
template <- templates_post_scripts(script_id = job$id, note = note, name = 'Pet Greeter')

Custom Scripts

scripts_post_custom creates an instance of a template that inherits all changes made to the template. We can now make a simple program to call and run an instance of the template.

job <- scripts_post_custom(id, arguments = arguments, ...)
run <- scripts_post_custom_runs(job$id)
await(scripts_get_custom_runs, id = job$id, run_id = run$id)

Conveniently, run_template does exactly this and is already provided in civis. It returns the output file ids of the job for you to use later on in your program.

out <- run_template(template$id, arguments = list(PET_NAME = 'CHARLES'))

To stay organized, let’s automatically add the script to an existing project:

# We might need to find the project id first
search_list(type = 'project', 'My project Name')
out <- run_template(template$id, arguments = list(PET_NAME = 'CHARLES'),
                    target_project_id = project_id)

Making Changes

To make changes to the template note or name, use templates_patch_scripts.

templates_patch_scripts(template_id$id, note = new_note)

To change the behavior, name, or parameters of the script, update the backing script using scripts_patch_r.

source <- c('
  library(civis)
  pet_name <- Sys.getenv("PET_NAME")
  msg <- paste0("Hello ", pet_name, "! Would you care for a sandwich?")
  print(msg)
')
scripts_patch_r(id = job$id, name = 'Pet Greeter',
                source = source,
                params = params)

Discoverability

To help share your template with others, use this link: https://platform.civisanalytics.com/spa/#/scripts/new/{your template id}.

This link will automatically direct the user to a new instance of the template.

It’s a good idea to archive unused templates so that it’s easy for users to find the right template quickly. This is important if you automatically deploy your templates.

Let’s clean up our experiment by archiving our Pet Greeter Template:

templates_patch_scripts(template$id, archived = TRUE)

Conclusion

That’s it! Now go forth and productionize!