OCR text and handwritten forms via http://captricity.com/.
To get the current development version from GitHub:
install.packages("devtools")
devtools::install_github("soodoku/captr")
library(captr)
Start by getting an application token and setting it using:
# Not a real token
set_token("6dbee39a047c4de2b576b966")
Then, create a batch using:
batch <- create_batch("wisc_ads")
batch$id
Next, upload image(s) to a batch
path <- system.file("extdata/wisc_ads", package = "captr")
files <- dir(path, full.names = TRUE)
upimage <- lapply(files, upload_image, batch_id = batch$id)
names(upimage[[5]])
Once you have created a batch, you need to go online and create a template which tells Captricity what data to pull from where. (Captricity requires a template for each job and it appears that they can only be built online.)
For instance, for this project, the template looked so:
Once you have a template, just go to inbox, and click on process batch and it will bring up potential templates. Pick the template you want and click ok.
Next, check whether the batch is ready to be processed:
tester <- test_readiness(batch_id=batch$id)
tester$errors
You may also want to find out how much would processing the batch set you back by:
price <- batch_price(batch_id=batch$id)
price$total_user_cost_in_cents
Next, submit the batch for processing. At this point, the batch changes to a job.
submit <- submit_batch(batch_id=batch$id)
submit$related_job_id
To track progress of a job, use:
progress <- track_progress(submit$related_job_id)
progress$percent_completed
List all forms (instance sets) associated with a job:
list_instances <- list_instance_sets(job_id=submit$related_job_id)
list_instances$id
If you want to download data from a particular form, use the list_instance_sets
to get the form (instance_set) id and run:
res1 <- get_instance_set(instance_set_id=list_instances$id[1])
res1$best_estimate
Get csv of all your results from a job:
get_all(job_id=submit$related_job_id)
Unfortunately, Captricity doesn’t do a particularly good job at getting you the text. For instance, Captricity considers getting text from these fields as ‘impossible’:
You can check out the final csv here.