encrypting and/or decrypting large files (AES) on a memory and storage constrained system, with "catastrophe recovery"

Asked 2/7, 2012 at 19:42 Answered 2/7, 2012 at 20:27

Solved encryption cryptography aes large-files on-the-fly

I have a fairly generic question, so please pardon if it is a bit vague.

So, let's a assume a file of 1GB, that needs to be encrypted and later decrypted on a given system.

Problem is that the system has less than 512 mb of free memory and about 1.5 GB storage space (give or take), so, with the file "onboard" we have about ~500 MB of "hard drive scratch space" and less than 512 mb RAM to "play with".

The system is not unlikely to experience an "unscheduled power down" at any moment during encryption or decryption, and needs to be able to successfully resume the encryption/decryption process after being powered up again (and this seems like an extra-unpleasant nut to tackle).

The questions are:

1) is it at all doable :) ?

2) what would be the best strategy to go about

a) encrypting/decrypting with so little scratch space (can't have the entire file lying around while decrypting/encrypting, need to truncate it "on the fly" somehow...)

and

b) implementing a disaster recovery that would work in such a constrained environment?

P.S.: The cipher used has to be AES.

I looked into AES-CTR specifically but it does not seem to bode all that well for the disaster recovery shenanigan in an environment where you can't keep the entire decrypted file around till the end...

[edited to add] I think I'll be doing it the Iserni way after all.

Liquidambar answered 2/7, 2012 at 19:42 Comment(1)

There is nothing that CBC brings to the table that CTR doesn't. Except for integrity CTR is certainly a very good choice for such a scheme. – Plowman 2/7, 2012 at 22:9

It is doable, provided you have a means to save the AES status vector together with the file position.

Save AES status and file position P to files STAGE1 and STAGE2
Read one chunk (say, 10 megabytes) of encrypted/decrypted data
Write the decrypted/encrypted chunk to external scratch SCRATCH
Log the fact that SCRATCH is completed
Write SCRATCH over the original file at the same position
Log the fact that SCRATCH has been successfully copied
Goto 1

If you get a hard crash after stage 1, and STAGE1 and STAGE2 disagree, you just restart and assume the stage with the earliest P to be good. If you get a hard crash during or after stage 2, you lose 10 megabytes worth of work: but the AES and P are good, so you just repeat stage 2. If you crash at stage 3, then on recovery you won't find the marker of stage 4, and so will know that SCRATCH is unreliable and must be regenerated. Having STAGE1/STAGE2, you are able to do so. If you crash at stage 4, you will BELIEVE that SCRATCH must be regenerated, even if you could avoid this -- but you lose nothing in regenerating except a little time. By the same token, if you crash during 5, or before 6 is committed to disk, you just repeat stages 5 and 6. You know you don't have to regenerate SCRATCH because stage 4 was committed to disk. If you crash after stage 1, you will still have a good SCRATCH to copy.

All this assumes that 10 MB is more than a cache's (OS + hard disk if writeback) worth of data. If it is not, raise to 32 or 64 MB. Recovery will be proportionately slower.

It might help to flush() and sync(), if these functions are available, after every write-stage has been completed.

Total write time is a bit more than twice normal, because of the need of "writing twice" in order to be sure.

Cellar answered 2/7, 2012 at 20:27 Comment(3)

So... basically, when encrypting, I 1) Initialize AES-CBC 2) save relevant crypto info 3) copy first xMB from beginning of plaintext to scratch 4) encrypt them 5) save last 16 bytes as "next step AES info" 6) write encrypted 10MB back into source and start messing with next 10 MB , journaling every step as I go so I can act accordingly after a crash ? Seems smart :) – Liquidambar 2/7, 2012 at 20:50

Having more disk space to store both source and output file would increase performances quite a bit. Or if the files might be "chunked" in, say, 100 MB chunks - then one could read PLAIN_1, write CRYPT_1, read PLAIN2, write CRYPT_2, delete PLAIN_1 and so on. At the end, you would have all CRYPT_* chunks with no need of extra reads or writes, except a separate "assurance" at the end of each chunk (a few bytes). It would also be more straightforward to implement, I think, and 100MB should defeat most caching schemes. – Cellar 2/7, 2012 at 21:12

Well, after some consideration, I think I'll do it this way. Thanks Iserni – Liquidambar 3/7, 2012 at 5:13

You have to work with the large file in chunks. Break a piece of the file off, encrypt it, and save it to disk; once saved, discard the unencrypted piece. Repeat. To decrypt, grab an encrypted piece, decrypt it, store the unencrypted chunk. Discard the encrypted piece. Repeat. When done decrypting the pieces, concatenate them.

Cornhusk answered 2/7, 2012 at 19:51 Comment(1)

Hmmmm... well, yes, it seems doable with AES-CBC since every next chunk is essentially encrypted with last 16 bytes of previous one, so as long as the "last good encrypted chunk" or, in case of encryption, "last good plaintext chunk" are known (via some kind of journaling trick, like having an additional file that stores data on last 10 operations or so) a straightforward route to "disaster recovery" should be available, amrite ? – Liquidambar 2/7, 2012 at 20:27

Surely this is doable.

The "largest" (not large at all however) problem is that when you encrypt say 128 Mb of original data, you need to remove them from the source file. To do this you need to copy the remainder of the file to the beginning and then truncate the file. This would take time. During this step power can be turned off, but you don't care much -- you know the size of data you've encrypted (if you encrypt data by blocks with size multiple to 16 bytes, the size of the encrypted data will be equal to size that was or has to be removed from the decrypted file). Unfortunately it seems to be easier to invent the scheme than to explain it :), but I really see no problem other than extra copy operations which will slowdown the process. And no, there's no generic way to strip the data from the beginning of the file without copying the remainder to the beginning.

Osyth answered 2/7, 2012 at 20:22 Comment(5)

Ah yes, indeed, truncation only works for end of file. but I see what' you're getting at, I think :) Thanks, will try to do it this way (and use the abominable performance resulting from all the copying to badger for a bigger HDD on this...thing :) ) – Liquidambar 2/7, 2012 at 20:46

Why can't you just copy the encrypted data over the plain data? You can save the state, can't you? – Plowman 2/7, 2012 at 22:8

@owlstead in case of power failure there's a big chance for corruption or data loss. – Phew 3/7, 2012 at 6:28

Just saving state, mapping the file, writing to the scratchpad, saving state, encrypting to original and saving state again won't work? Of course you need a scratchpad and state, but moving much of the file each time seems a bit of a waste (or actually, a lot of waste). – Plowman 3/7, 2012 at 15:57

@owlstead yes, this is how journalling filesystems work (including our SolFS). I didn't count that copying remainder of the file can damage the original file as likely as writing encrypted data in-place. So you are right, my idea is not a guarantee against possible file corruption. – Phew 3/7, 2012 at 16:56

Recommended topics

Hot tags