How to encrypt large file using Python?
Asked Answered
S

4

5

I'm trying to encrypt file that is larger than 1GB. I don't want to read it all to memory. I chose Fernet (cryptography.fernet) for this task, because it was most recommended (faster than asymetric solutions).

I generated the key. Then I've created a script to encrypt:

    key = Fernet(read_key())

    with open(source, "rb") as src, open(destination, "wb") as dest:
        for chunk in iter(lambda: src.read(4096), b""):
            encrypted = key.encrypt(chunk)
            dest.write(encrypted)

and for decryption:

    key = Fernet(read_key())

    with open(source, "rb") as src, open(destination, "wb") as dest:
        for chunk in iter(lambda: src.read(4096), b""):
            decrypted = key.decrypt(chunk)
            dest.write(decrypted)

Encryption works - no surprise, but decryption is not. Firstly I thought that it might work, but it's not. I guess chunk size increases when encrypted, and then when I'm reading 4096 bytes, it's not a whole encrypted chunk. I've got an error trying to decrypt:

Traceback (most recent call last):
  File "/redacted/path/venv/lib/python3.7/site-packages/cryptography/fernet.py", line 119, in _verify_signature
    h.verify(data[-32:])
  File "/redacted/path/venv/lib/python3.7/site-packages/cryptography/hazmat/primitives/hmac.py", line 74, in verify
    ctx.verify(signature)
  File "/redacted/path/venv/lib/python3.7/site-packages/cryptography/hazmat/backends/openssl/hmac.py", line 75, in verify
    raise InvalidSignature("Signature did not match digest.")
cryptography.exceptions.InvalidSignature: Signature did not match digest.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/redacted/path/main.py", line 63, in <module>
    decrypted = key.decrypt(chunk)
  File "/redacted/path/venv/lib/python3.7/site-packages/cryptography/fernet.py", line 80, in decrypt
    return self._decrypt_data(data, timestamp, time_info)
  File "/redacted/path/venv/lib/python3.7/site-packages/cryptography/fernet.py", line 137, in _decrypt_data
    self._verify_signature(data)
  File "/redacted/path/venv/lib/python3.7/site-packages/cryptography/fernet.py", line 121, in _verify_signature
    raise InvalidToken
cryptography.fernet.InvalidToken

Is there's a way to solve this? Maybe there's a better (simpler) approach with different solution than fernet?

Scrawl answered 24/9, 2021 at 9:46 Comment(2)
Does this answer your question? Cryptography token object raises exception and cannot decrypt even though the tokens are identicalTaunyataupe
@Taunyataupe No. I generated a key once and store it in file from which I read using read_key(). I saw that question earlier, when I was looking for an answer. I couldn't find anything helpful.Scrawl
U
4

Fernet is not supposed to be used in a streaming fashion. They explain that in the documentation:

From the documentation (last section):

Limitations

Fernet is ideal for encrypting data that easily fits in memory. As a design feature it does not expose unauthenticated bytes. This means that the complete message contents must be available in memory, making Fernet generally unsuitable for very large files at this time.

Unsuspecting answered 24/9, 2021 at 10:34 Comment(0)
D
5

I just ran into the same issue - I feel your pain brother.

There are some issues with Fernet that make it incompatible with your approach:

  1. Fernet spits out urlsafe_base64 encoded data. This means that for every 3 bytes of unencrypted data consumed, Fernet will spit out 4 bytes of encrypted data.

This prevents you from using the same "chunk size" when encrypting and decrypting, as the decryption chunk size must necessarily be bigger. Unfortunately, treating the data with urlsafe_b64decode/urlsafe_b64encode also doesn't do the trick because:

  1. Fernet seems to add some sort of digest/checksum/metadata somewhere in the encrypted data.

There probably is a straightforward way to work out how big this digest is and adjusting the decryption chunk size to accommodate this - but I wanted to avoid doing stuff with "magic constants" as that felt quite gross.

The solution I settled on actually ended up being quite elegant. It works as follows:

Encryption:

  1. Read n bytes of data (raw_chunk)
  2. Encrypt n bytes with Fernet to create an m bytes chunk (enc_chunk).
  3. Use len(enc_chunk).to_bytes(4, "big") to write the size of the encrypted chunk to the file
  4. Write the encrypted chunk to the file
  5. Break when I read a b""

Decryption:

  1. Read 4 bytes of data (size)
  2. Break if the data is a b""
  3. Convert those 4 bytes into an integer using int.from_bytes(size, "big") (num_bytes)
  4. Read num_bytes of encrypted data
  5. Decrypt this data with Fernet with no problems
Disharoon answered 10/2, 2022 at 16:3 Comment(0)
U
4

Fernet is not supposed to be used in a streaming fashion. They explain that in the documentation:

From the documentation (last section):

Limitations

Fernet is ideal for encrypting data that easily fits in memory. As a design feature it does not expose unauthenticated bytes. This means that the complete message contents must be available in memory, making Fernet generally unsuitable for very large files at this time.

Unsuspecting answered 24/9, 2021 at 10:34 Comment(0)
T
3

You can easily make any non-streaming algorithm (like Fernet) into streaming algorithm just by slicing input data into chunks and storing chunk length inside encrypted file, this was already suggested by @tlonny. This is only possible if you can afford any format of encrypted data file.

Converting chunk size to bytes can be done in different ways. One is using struct.pack() and struct.unpack() like I did in following code. Another way is to use int(size).to_bytes(4, 'little') and size = int().from_bytes(size_bytes, 'little').

Following code has full implementation of encrypt() and decrypt() together with example of usage (encrypting 2 MB of random data sliced into 64 KB chunks).

Try it online!

def encrypt(key, fin, fout, *, block = 1 << 16):
    import cryptography.fernet, struct
    fernet = cryptography.fernet.Fernet(key)
    with open(fin, 'rb') as fi, open(fout, 'wb') as fo:
        while True:
            chunk = fi.read(block)
            if len(chunk) == 0:
                break
            enc = fernet.encrypt(chunk)
            fo.write(struct.pack('<I', len(enc)))
            fo.write(enc)
            if len(chunk) < block:
                break

def decrypt(key, fin, fout):
    import cryptography.fernet, struct
    fernet = cryptography.fernet.Fernet(key)
    with open(fin, 'rb') as fi, open(fout, 'wb') as fo:
        while True:
            size_data = fi.read(4)
            if len(size_data) == 0:
                break
            chunk = fi.read(struct.unpack('<I', size_data)[0])
            dec = fernet.decrypt(chunk)
            fo.write(dec)

def test():
    import cryptography.fernet, secrets
    key = cryptography.fernet.Fernet.generate_key()
    with open('data.in', 'wb') as f:
        data = secrets.token_bytes(1 << 21)
        f.write(data)
    encrypt(key, 'data.in', 'data.enc')
    decrypt(key, 'data.enc', 'data.out')
    with open('data.out', 'rb') as f:
        assert data == f.read()

if __name__ == '__main__':
    test()
Teufert answered 23/2, 2022 at 19:48 Comment(0)
T
1

Due to limitation in memory, we could use chunks to encrypt and decrypt.

#
# encrypt
#
key = b'Ke0Ft_85-bXQ8GLOOsEI6JeT2mD-GeI8pkcP_re8wio='
in_file_name = 'plain.txt'
out_file_name = 'encypted.txt'
with open(in_file_name, "rb") as fin, open(out_file_name, "wb") as fout:
    while True:
        block = fin.read(524288)  # 2^19
        if not block:
            break
        f = Fernet(key)
        output = f.encrypt(block)
        print('encrypted block size: ' + str(len(block)))  # returns 699148
        fout.write(output)

#
# decrypt
#
in_file_name = 'encrypted.txt'
out_file_name = 'plain2.txt'
with open(in_file_name, "rb") as fin, open(out_file_name, "wb") as fout:
    while True:
        block = fin.read(699148)
        if not block:
            break
        f = Fernet(key)
        output = f.decrypt(block)
        fout.write(output)

The block size values are determined as follows:

Starting with 4096 as encryption block size it produced a consistent output of a constant number of bytes except for the final chunk of <4096 bytes. Finally, it was raised to 524288, again it returned a consistent number of bytes - 699148 except the final chunk of <699148 bytes.

Using encryption of 524288 and decryption of 699148 bytes, large files over 35 GB had been successful in encryption and decryption.

block = fin.read(524288) # 2^19
print('encrypted block size: ' + str(len(block)))  # returns 699148
Theone answered 12/11, 2021 at 1:0 Comment(3)
How are you determining your block size (524288 and 699148)?Neisse
I'm a bit late, but i really do not understand how you can find yours blocks sizes values (524288 and 699148). Because i got block = fin.read(524288) # 2^19 print('encrypted block size: ' + str(len(block))) # returns 524288Aldric
Ok I understand just 699148 is the size of the encrypted chunksAldric

© 2022 - 2024 — McMap. All rights reserved.