base64 decoded file is not equal to the original unencoded file - McMap

About

base64 decoded file is not equal to the original unencoded file

Asked 24/1, 2012 at 17:13 Answered 24/1, 2012 at 17:23

Solved java base64 encode java-io

S

1

2

I have a normal pdf file A.pdf , a third party encodes the file in base64 and sends it to me in a webservice as a long string (i have no control on the third party).

My problem is that when i decode the string with java org.apache.commons.codec.binary.Base64 and right the output to a file called B.pdf I expect B.pdf to be identical to A.pdf, but B.pdf turns out a little different then A.pdf. As a result B.pdf is not recognized as a valid pdf by acrobat.

Does base64 have different types of encoding\charset mechanisms? can i detect how the string I received is encoded so that B.pdf=A.pdf ?

EDIT- this is the file I want to decode, after decoding it should open as a pdf

my encoded file

this is the header of the files opened in notepad++

**A.pdf**
        %PDF-1.4
        %±²³´
        %Created by Wnv/EP PDF Tools v6.1
        1 0 obj
        <<
        /PageMode /UseNone
        /ViewerPreferences 2 0 R
        /Type /Catalog

  **B.pdf**
        %PDF-1.4
        %±²³´
        %Created by Wnv/EP PDF Tools v6.1
        1 0! bj
        <<
        /PageMode /UseNone
        /ViewerPreferences 2 0 R
        /]
        pe /Catalog

this is how I decode the string

private static void decodeStringToFile(String encodedInputStr,
            String outputFileName) throws IOException {
        BufferedReader in = null;
        BufferedOutputStream out = null;
        try {
            in = new BufferedReader(new StringReader(encodedInputStr));
        out = new BufferedOutputStream(new FileOutputStream(outputFileName));
            decodeStream(in, out);
            out.flush();
        } finally {
            if (in != null)
                in.close();
            if (out != null)
                out.close();
        }
    }

    private static void decodeStream(BufferedReader in, OutputStream out)
            throws IOException {
        while (true) {
            String s = in.readLine();
            if (s == null)
                break;
            //System.out.println(s);
            byte[] buf = Base64.decodeBase64(s);
            out.write(buf);
        }

    }

Straddle answered 24/1, 2012 at 17:13 Comment(4)

I've seen similar results in the past when using Strings. You might just try using the raw byte[]s instead and see if it makes a difference. – Wootan 24/1, 2012 at 17:15

You need to show the block of code that's doing the base64 encoding as well. – Transverse 24/1, 2012 at 17:16

I only geta string from the third party. should i convert the string to bytes with String.getBytes(charset)? how do I know what charset to use? – Straddle 24/1, 2012 at 17:18

I dont have the encoding code, as I said, its from a third party that is not available to me ()its not even in java. – Straddle 24/1, 2012 at 17:19

D

2

You are breaking your decoding by working line-by-line. Base64 decoders simply ignore whitespace, which means that a byte in the original content could very well be broken into two Base64 text lines. You should concatenate all the lines together and decode the file in one go.
Prefer using byte[] rather than String when supplying content to the Base64 class methods. String implies character set encoding, which may not do what you want.

Distorted answered 24/1, 2012 at 17:23 Comment(5)

concatenating all the lines to one string did not work, also using String.getBytes() did not work. actually these 2 approches gave a worse result then the original result (b was very different then a) – Straddle 24/1, 2012 at 18:2

@dov.amir: Perhaps you should post a sample of the Base64 data sent by the server, so that we could see what is going on. Regardless of that, decoding Base64 content line-by-line is still broken in the general case, unless the lines are split in multiples of 4 characters. – Distorted 24/1, 2012 at 18:13

@dov.amir: 1. You definitely have an issue with the newlines, since there are exactly 74 base64 characters in each line. This is probably the reason for the mangling you see in the PDF header. 2. Are you certain about the base64 stream? The file that you uploaded contains exactly 118,602 characters, which is not a multiple of 4 as it should. If your link is really supposed to contain an entire file, then the problem seems to be somewhere before the Base64 decoding. What is the size of the source PDF file? – Distorted 25/1, 2012 at 1:17

solved it! my decoding worked after I managed to get the 3rd party to encode the text in lines of length 72 instead of lines of length 74. – Straddle 31/1, 2012 at 12:38

@dov.amir: That would make the line length a multiple of 4 which would work around the issue in your code. IMO, though, it would be better to fix your own code: Be liberal in what you accept, and conservative in what you send. After all, 74 is a rather typical line length for Base64 encoders... – Distorted 31/1, 2012 at 13:2

Recommended topics

#Godot #Unity #Godot 4.X #Mongodb

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

© 2022 - 2024 — McMap. All rights reserved.