How to protect/encrypt R objects in RData files due to EU-GDPR
Asked Answered
N

3

5

I want to protect the content of my RData files with a strong encryption algorithm since they may contain sensitive personal data which must not be disclosed due to (legal) EU-GDPR requirements.

How can I do this from within R?

I want to avoid a second manual step to encrypt the RData files after creating them to minimize the risk of forgetting it or overlooking any RData files.

I am working with Windows in this scenario...

Nelle answered 17/10, 2018 at 9:38 Comment(4)
Maybe you should talk to a lawyer first, to make sure you actually have to do this, before you ask how to do it.Aggappe
GDPR doesn't care about encryption. It cares if you store the data or not, for what purpose and how long. If you can decrypt the data, it means you have the data.Jordon
I shouldn't have mentioned GDPR since the legal and compliance side is clarified in my case. BTW: Article 32 ("security of processing") states that "the processor... shall implement appropriate technical... measures to ensure a level of security... inter alia... encryption of personal data..."Nelle
That doesn't mean what you think it means.Jordon
R
13
library(openssl)

x <- serialize(list(1,2,3), NULL)

passphrase <- charToRaw("This is super secret")
key <- sha256(passphrase)

encrypted_x <- aes_cbc_encrypt(x, key = key)

saveRDS(encrypted_x, "secret-x.rds")

encrypted_y <- readRDS("secret-x.rds")

y <- unserialize(aes_cbc_decrypt(encrypted_y, key = key))

You need to deal with secrets management (i.e. the key) but this general idiom should work (with a tad more bulletproofing).

Rothko answered 17/10, 2018 at 10:11 Comment(2)
Good code snippet! I wanted to avoid encrypting my objects before saving them but R does not offer this in base I have still not found a package that offers a single function for that. I am sure, your solution is working (esp. in combination with the package secret to store the password: cran.r-project.org/package=secret).Nelle
Aye. There's not base serialize but you could write a wrapper for saveRDS() which you could put into a locally sourced package.Rothko
H
1

I know it's very late but checkout this package endecrypt

Installation :

devtools::install_github("RevanthNemani\endecrypt")

Use the following functions for column encryption:

airquality <- EncryptDf(x = airquality, pub.key = pubkey, encryption.type = "aes256")

For column decryption:

airquality <- DecryptDf(x = airquality, prv.key = prvkey, encryption.type = "aes256")

Checkout this Github page

Just remember to generate your keys and save it for first use. Load the keys when required and supply the key object to the functions.

Eg

SaveGenKey(bits = 2048,
              private.key.path = "Encription/private.pem",
              public.key.path = "Encription/public.pem")

# Load keys already stored using this function 
prvkey <- LoadKey(key.path = "Encription/private.pem", Private = T)

It is very easy to use and your dataframes can be stored in a database or Rdata file.

Hom answered 26/7, 2019 at 21:18 Comment(0)
R
1

Using hrbrmstr answer I made a simple code snippet of two functions: saveRDSEnc and readRDSEnc.

If object's size is big it is much better to save the object first, load saved content as raw object, encrypt it and then save encrypted content without compression. Code below is using this fact.

library(openssl)

###
#' Serialization Interface for Single Objects with encryption
#' 
#' @details Function to write a single R object to a file with encryption using
#'  symmetric AES encryption
#'
#' @param ... arguments passed to saveRDS function
#' @param password Encryption password
#'
#' @return NULL
#' @export
#' 
#' @example
#' x <- "Hello world!"
#' saveRDSEnc(x, file='test.rds', compress='xz', password='1234')
###
saveRDSEnc <- function(..., password) {
  stopifnot("Missing password!" = !missing(password))
  
  args <- list(...)
  key <- openssl::sha256(charToRaw(as.character(password)))
  saveRDS(...)
  
  x <- readBin(con = args$file, what = raw(), n = file.size(args$file))
  x <- openssl::aes_cbc_encrypt(data = x, key = key)
  saveRDS(object = x, file = args$file, compress = FALSE)
  
  invisible(NULL)
}

###
#' Serialization Interface for Single Objects with encryption
#'
#' @details Function to read a single R object from an ecrypted file using
#'  symmetric AES decryption.
#'
#' @param ... arguments passed to readRDS function
#' @param password Decryption password 
#'
#' @return Restored object
#' @export
#' 
#' @example
#' x <- readRDSEnc('test.rds', password='1234')
#' print(x) # Hello world!
###
readRDSEnc <- function(..., password) {
  stopifnot("Missing password!" = !missing(password))
  
  args <- list(...)
  key <- openssl::sha256(charToRaw(as.character(password)))
  tmpf <- tempfile()
  
  tryCatch({
    x <- readRDS(...)
    x <- openssl::aes_cbc_decrypt(data = x, key = key)
    writeBin(object = x, con = tmpf)
    args$file <- tmpf
    x <- do.call(readRDS, args)
  }, finally = unlink(tmpf))

  x
}
Rosenfeld answered 5/5, 2023 at 10:6 Comment(1)
THX for your code snippet. Please note that saving the unencrypted data in a file first was what I wanted to avoid and e.g. journaling file systems could even allow to recover the unencrypted data file after "overwriting" it with the encrypted version. Also: After writing the unencrypted file but before overwriting it with the encrypted version an error may occur leaving the unencrypted data file on the disk.Nelle

© 2022 - 2024 — McMap. All rights reserved.