Base64 Encode File Using NSData Chunks
Asked Answered
A

2

8

Update 4
Per Greg's suggestion I've created one pair of image/text that shows the output from a 37k image to base64 encoded, using 100k chunks. Since the file is only 37k it's safe to say the loop only iterated once, so nothing was appended. The other pair shows the output from the same 37k image to base64 encoded, using 10k chunks. Since the file is 37k the loop iterated four times, and data was definitely appended.

Doing a diff on the two files shows that on the 10kb chunk file there's a large difference that begins on line 214 and ends on line 640.

Update 3
Here's where my code is now. Cleaned up a bit but still producing the same effect:

// Read data in chunks from the original file
[originalFile seekToEndOfFile];
NSUInteger fileLength = [originalFile offsetInFile];
[originalFile seekToFileOffset:0];
NSUInteger chunkSize = 100 * 1024;
NSUInteger offset = 0;

while(offset < fileLength) {
    NSData *chunk = [originalFile readDataOfLength:chunkSize];
    offset += chunkSize;

    // Convert the chunk to a base64 encoded string and back into NSData
    NSString *base64EncodedChunkString = [chunk base64EncodedString];
    NSData *base64EncodedChunk = [base64EncodedChunkString dataUsingEncoding:NSASCIIStringEncoding];

    // Write the encoded chunk to our output file
    [encodedFile writeData:base64EncodedChunk];

    // Cleanup
    base64EncodedChunkString = nil;
    base64EncodedChunk = nil;

    // Update progress bar
    [self updateProgress:[NSNumber numberWithInt:offset] total:[NSNumber numberWithInt:fileLength]];
}

Update 2
So it looks like files that are larger than 100 KB get scrambled, but files under 100 KB are fine. It's obvious that something is off on my buffer/math/etc, but I'm lost on this one. Might be time to call it a day, but I'd love to go to sleep with this one resolved.

Here's an example:

Update 1
After doing some testing I have found that the same code will work fine for a small image, but will not work for a large image or video of any size. Definitely looks like a buffer issue, right?


Hey there, trying to base64 encode a large file by looping through and doing it one small chunk at a time. Everything seems to work but the files always end up corrupted. I was curious if anyone could point out where I might be going wrong here:

    NSFileHandle *originalFile, *encodedFile;
    self.localEncodedURL = [NSString stringWithFormat:@"%@-base64.xml", self.localURL];

    // Open the original file for reading
    originalFile = [NSFileHandle fileHandleForReadingAtPath:self.localURL];
    if (originalFile == nil) {
        [self performSelectorOnMainThread:@selector(updateStatus:) withObject:@"Encoding failed." waitUntilDone:NO];
        return;
    }
    encodedFile = [NSFileHandle fileHandleForWritingAtPath:self.localEncodedURL];
    if (encodedFile == nil) {
        [self performSelectorOnMainThread:@selector(updateStatus:) withObject:@"Encoding failed." waitUntilDone:NO];
        return;
    }

    // Read data in chunks from the original file
    [originalFile seekToEndOfFile];
    NSUInteger length = [originalFile offsetInFile];
    [originalFile seekToFileOffset:0];
    NSUInteger chunkSize = 100 * 1024;
    NSUInteger offset = 0;
    do {
        NSUInteger thisChunkSize = length - offset > chunkSize ? chunkSize : length - offset;
        NSData *chunk = [originalFile readDataOfLength:thisChunkSize];
        offset += [chunk length];

        NSString *base64EncodedChunkString = [chunk base64EncodedString];
        NSData *base64EncodedChunk = [base64EncodedChunkString dataUsingEncoding:NSASCIIStringEncoding];

        [encodedFile writeData:base64EncodedChunk];

        base64EncodedChunkString = nil;
        base64EncodedChunk = nil;

    } while (offset < length);
Andaman answered 4/10, 2010 at 21:55 Comment(12)
In what way is the output corrupted?Boughpot
For both images (JPEG) and videos (Quicktime), the files are unreadable.Andaman
Which is leading me to believe there's something wrong with my offset. When I use the same base64 category on NSData without splitting it into chunks, it works just fine.Andaman
Have you tried comparing your chunked base64 output to the good base64 output? Base64 will pad the output in some cases, so maybe some of the chunks are being padded.Boughpot
Not yet - that's a great idea. Thanks!Andaman
OK so if I use the same code above to upload an image, it works. If I use it to upload a video file, even one of just 2 seconds, it fails. The video is corrupted.Andaman
FYI, you can remove thisChunkSize and just do NSData *chunk = [originalFile readDataOfLength:chunkSize]. readDataOfLength:length reads length bytes or to the end of the file, whichever comes first.Boughpot
Also, check out this page, which goes into a lot of detail about reading files. cocoadev.com/index.pl?ReadAFilePieceByPiece I think between this and my previous comments, there's probably an answer. :-) I'll post an answer if these comments helped.Boughpot
Thanks Greg. This helped a lot, I understand everything that's going on, but I still can't pin down why the offset seems to be getting out of whack.Andaman
Can you try this: lower your chunk size to something small enough that it will cause corruption on one of your smaller images, produce the corrupt output, then raise it high enough that it produces a good file. Then, diff the two Base64 output files and let us know what the diff is. That might help others to have an idea what's up.Boughpot
Thanks. Updated post to reflect your suggestion.Andaman
Greg, just wanted to let you know I gave you a shout out in the final code. Thanks again. ios.trac.wordpress.org/browser/trunk/Classes/WPMediaUploader.mAndaman
A
2

I wish I could give credit to GregInYEG, because his original point about padding was the underlying issue. With base64, each chunk has to be a multiple of 3. So this resolved the issue:

chunkSize = 3600

Once I had that, the corruption went away. But then I ran into memory leak issues, so I added the autorelease pool apprach taken from this post: http://www.cocoadev.com/index.pl?ReadAFilePieceByPiece

Final code:

// Read data in chunks from the original file
[originalFile seekToEndOfFile];
NSUInteger fileLength = [originalFile offsetInFile];
[originalFile seekToFileOffset:0];

// For base64, each chunk *MUST* be a multiple of 3
NSUInteger chunkSize = 24000;
NSUInteger offset = 0;
NSAutoreleasePool *chunkPool = [[NSAutoreleasePool alloc] init];

while(offset < fileLength) {
    // Read the next chunk from the input file
    [originalFile seekToFileOffset:offset];
    NSData *chunk = [originalFile readDataOfLength:chunkSize];

    // Update our offset
    offset += chunkSize;

    // Base64 encode the input chunk
    NSData *serializedChunk = [NSPropertyListSerialization dataFromPropertyList:chunk format:NSPropertyListXMLFormat_v1_0 errorDescription:NULL];
    NSString *serializedString =  [[NSString alloc] initWithData:serializedChunk encoding:NSASCIIStringEncoding];
    NSRange r = [serializedString rangeOfString:@"<data>"];
    serializedString = [serializedString substringFromIndex:r.location+7];
    r = [serializedString rangeOfString:@"</data>"];
    serializedString = [serializedString substringToIndex:r.location-1];

    // Write the base64 encoded chunk to our output file
    NSData *base64EncodedChunk = [serializedString dataUsingEncoding:NSASCIIStringEncoding];
    [encodedFile truncateFileAtOffset:[encodedFile seekToEndOfFile]];
    [encodedFile writeData:base64EncodedChunk];

    // Cleanup
    base64EncodedChunk = nil;
    serializedChunk = nil;
    serializedString = nil;
    chunk = nil;

    // Update the progress bar
    [self updateProgress:[NSNumber numberWithInt:offset] total:[NSNumber numberWithInt:fileLength]];

    // Drain and recreate the pool
    [chunkPool release];
    chunkPool = [[NSAutoreleasePool alloc] init];
}
[chunkPool release];
Andaman answered 6/10, 2010 at 14:17 Comment(1)
Hey @Andaman ... Thanks for posting the answer but the code looks incomplete(specially the while loop)... can you please post the complete code if you can... Thanks...Longbow
P
1

How are you converting back the base64 data to an image? Some implementations limit the maximum line length they will accept. Try inserting a line break every so many characters.

Pidgin answered 5/10, 2010 at 2:37 Comment(3)
Just tried adding a newline for every loop but it didn't make a difference.Andaman
But each iteration in your loop is more than 25,000 characters long! Try something like 80 characters per line.Pidgin
OK, just tried adding newline every 80 characters and got the same result. Thanks for the suggestion though!Andaman

© 2022 - 2024 — McMap. All rights reserved.