High-level functions for supporting encryption and decryption of data
from R. This allows secure storage and exchange of information, while
trying to keep the encryption/decryption code from taking over your
analyses. cyphr
wraps the lower-level support from sodium
and openssl
. This
package is designed to be easy to use, rather than the most secure thing
(you’re using R, remember - for examples of what cyphr
can’t protect against see jammr
, rpwnd
and evil.R
.)
cyphr
provides high level functions to:
encrypt_string
/
decrypt_string
encrypt_object
/
decrypt_object
encrypt_data
/
decrypt_data
encrypt_file
/
decrypt_file
encrypt
and
decrypt
) around R’s file reading and writing functions that
enable transparent encryption (support included for
readRDS
/writeRDS
,
read.csv
/write.csv
, etc).The package aims to make encrypting and decrypting as easy as
::encrypt(save.csv(dat, "file.csv"), key) cyphr
and
<- cyphr::decrypt(read.csv("file.csv", stringsAsFactors = FALSE), key) dat
In addition, the package implements a workflow that allows a group to securely share data by encrypting it with a shared (“symmetric”) key that is in turn encrypted with each users ssh keys. The use case is a group of researchers who are collaborating on a dataset that cannot be made public, for example containing sensitive data. However, they have decided or need to store it in a setting that they are not 100% confident about the security of the data. So encrypt the data at each read/write.
Install cyphr
from CRAN with
install.packages("cyphr")
To install a development version from github, you can use
remotes
To install cyphr
from github:
::install_github("ropensci/cyphr", upgrade = FALSE) remotes
The scope of the package is to protect data that has been saved to disk. It is not designed to stop an attacker targeting the R process itself to determine the contents of sensitive data. The package does try to prevent you accidentally saving to disk the contents of sensitive information, including the keys that could decrypt such information.
Decide on a style of encryption and create a key object
key_sodium
: Symmetric encryption, using sodium – everyone
shares the same key (which must be kept secret!) and can encrypt and
decrypt data with it. This is used as a building block but is inflexible
because of the need to keep the key secret.key_openssl
: Symmetric encryption using opensslkeypair_sodium
: Public key encryption with sodium – this lets
people encrypt messages using your public key that only you can read
using your private key.keypair_openssl
: Public key encryption, using openssl, which has
the big advantage that many people already have compatible (ssh) keys in
standard places with standard file formats (see
?encrypt_envelope
in the the openssl
package).cyphr
does not include wrappers for key generation for
sodium - sodium keys do not have a file format: So a secret symmetric
key in sodium
might be:
<- sodium::keygen()
k k
## [1] 48 35 a2 6c 05 27 65 75 cb 08 01 de 76 8f 71 fe 3f d7 e4 7a df bf d8 e7 08
## [26] d5 fb e9 61 c8 5f d1
With this key we can create the key_sodium
object:
<- cyphr::key_sodium(k)
key key
## <cyphr_key: sodium>
If the key was saved to file that would work too:
If you load a password protected ssh key you will be prompted for
your passphrase. cyphr
will ensure that this is not echoed
onto the console.
<- cyphr::key_openssl()
key ## Please enter private key passphrase:
key
If you have files that already exist and you want to encrypt or
decrypt, the functions cyphr::encrypt_file
and
cyphr::decrypt_file
will do that (these are workhorse
functions that are used internally throughout the package)
saveRDS(iris, "myfile")
::encrypt_file("myfile", key, "myfile.encrypted") cyphr
The file is encrypted now:
readRDS("myfile.encrypted")
## Error in readRDS("myfile.encrypted"): unknown input format
Decrypt the file and read it:
::decrypt_file("myfile.encrypted", key, "myfile.clear")
cyphridentical(readRDS("myfile.clear"), iris)
## [1] TRUE
Encrypting files like the above risks leaving a cleartext (i.e.,
unencrypted) version around. If you want to wrap the output of something
like write.csv
or saveRDS
you really have no
choice but to write out the file first, encrypt it, and delete the clear
version. Making sure that this happens even if a step fails is error
prone and takes a surprising number of repetitive lines of code.
Alternatively, to encrypt the output of a file producing command,
just wrap it in cyphr::encrypt
::encrypt(saveRDS(iris, "myfile.rds"), key) cyphr
Then to decrypt the a file to feed into a file consuming command,
wrap it in cyphr::decrypt
<- cyphr::decrypt(readRDS("myfile.rds"), key) dat
The round-trip preserves the data:
identical(dat, iris) # yay
## [1] TRUE
But without the key, it cannot be read:
readRDS("myfile.rds")
## Error in readRDS("myfile.rds"): unknown input format
The above commands work through computing on the language, rewriting
the readRDS
and saveRDS
commands. Commands for
reading and writing tabular and plain text files (read.csv
,
readLines
, etc) are also supported, and the way the
rewriting is done is designed to be extensible.
The argument to the wrapped functions can be connection objects. In this case the actual command is written to a file and the contents of that file are encrypted and written to the connection. When reading/writing multiple objects from/to a single connection though, this is likely to go very badly.
The functions supported so far are:
readLines
/ writeLines
readRDS
/ writeRDS
read
/ save
read.table
/ write.table
read.csv
/ read.csv2
/
write.csv
read.delim
/ read.delim2
However, there are bound to be more functions that could be useful to
add here (e.g., readxl::read_excel
). Either pass the name
of the file argument to cyphr::encrypt
/
cyphr::decrypt
as
::decrypt(readxl::read_excel("myfile.xlsx"), key, file_arg = "path") cyphr
or register the function with the package using
rewrite_register
:
::rewrite_register("readxl", "read_excel", "path") cyphr
Then you can use
::decrypt(readxl::read_excel("myfile.xlsx"), key) cyphr
to decrypt the file (these are equivalent, but the former will likely be more convenient if you’re only dealing with a couple of files, the latter will be more convenient if you are dealing with many).
Even with high-level functions to ease encrypting and decrypting things given a key, there is some work to be done to distribute a set of keys across a group of people who are working together so that everyone can encrypt and decrypt the data but so that the keys themselves are not compromised.
The package contains support for a group of people are working on a
sensitive data set. The data will be stored with a symmetric key.
However, we never actually store the key directly, instead we’ll store a
copy for each user that is encrypted with the user’s key. Any user with
access to the data can authorise another user to access the data. This
is described in more detail in the vignette
(in R: vignette("data", package = "cyphr")
).
The low level functions in sodium
and
openssl
work with raw data, for generality. Few users
encounter raw vectors in their typical use of R, so these require
serialisation. Most of the encryption involves a little extra random
data (the “nonce” in sodium
and similar additional pieces
with openssl
). These need storing with the data, and then
separating from the data when decryption happens.
MIT © Rich FitzJohn.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.