Dataflow, loading a file with a customer supplied encryption key
Asked Answered
S

1

9

When trying to load a GCS file using a CSEK I get a dataflow error

[ERROR] The target object is encrypted by a customer-supplied encryption key

I was going to try to AES decrypt on the dataflow side, but I see I can't even get the file without passing an encryption key.

Is there another way to load CSEK encrypted Google Cloud Storage files from within dataflow? For example using the google cloud storage api, getting a stream handle then passing that to dataflow?

    // Fails
    p.apply("Read from source", TextIO.read().from("gs://my_bucket/myfile")).apply(..); 
Sunn answered 6/8, 2018 at 20:19 Comment(0)
M
8

By the documentation Cloud Dataflow do not currently support objects encrypted with customer-supplied encryption keys. I opened a feature request for this to be implemented.

Note that you can't get a file in Cloud Storage which has been uploaded using customer-supplied encryption key (CSEK) without having that encrypted key.

By the documentation:

If you use customer-supplied encryption keys or client-side encryption, you must securely manage your keys and ensure that they are not lost. If you lose your keys, you are no longer able to read your data, and you continue to be charged for storage of your objects until you delete them.

If we still have the CSE key, sample Java code to access the file is:

byte[] content = storage.readAllBytes(
    bucketName, blobName, BlobSourceOption.decryptionKey(decryptionKey));

All other possible methods of getting file with CSEK are described here.

Merchantable answered 7/8, 2018 at 14:17 Comment(3)
Thanks, but I need to get it from dataflow (apache beam). If apache beam would take an input stream from the google storage api, that would also be enough, but I cannot find a way to do thatSunn
Really unfortunate that dataflow does not have a way out of the box to access encrypted files in GCS. Probably it would not be too painful to write something that uses the google apis library to get a stream, and surface that as a value providerSunn
To add to this good answer. Even if you could access the raw encrypted object, the data chunks are not encrypted with the CSEK. The CSEK is used to encrypt (wrap) the raw chunk keys (DEK) that actually encrypt the data. A different DEK is used for each data chunk.Guillotine

© 2022 - 2024 — McMap. All rights reserved.