Why does the Blowfish output in Java and PHP differ by only 2 chars?

Asked 20/7, 2011 at 9:22 Answered 20/8, 2011 at 16:37

Solved java php encryption interop blowfish

I have a blowfish encryption script in PHP and JAVA vice versa that was working fine until today when I came across a problem.

The same content is encrypted differently in Java vs PHP by only 2 chars, which is really weird.

PHP

wTHzxfxLHdMm/JMFnoh0hciS/JADvFFg

Java

wTHzxfxLHdMm/JMFnoh0hciS/D8DvFFg
-------------------------^^

As you see those two positions do not match. Unfortunately the value is a real email address and I can't share it. Also I was not able to reproduce the problem with other few values I've tested. I've tried changing Base64 encode classes on Java, and that neither helped.

The source code for PHP is here, and for Java is here.

What could I do to resolve this problem?

Hartz answered 20/7, 2011 at 9:22 Comment(10)

Could be something to do with the character encoding used to represent the email address in Java and PHP? Is there a non-ASCII character in the address? – Fated 20/7, 2011 at 9:37

No there is not, only alphas and dot. – Hartz 20/7, 2011 at 9:42

only blowfish does this? How about md5/sha? – Actinomycosis 20/7, 2011 at 10:33

When base64-decoded the difference is in exactly one byte, the 20th. I looked at the code and didn't immediately notice any issues. – Lusitania 20/7, 2011 at 10:43

@Actinomycosis I need to have encryption, not hashing as I need to decrypt the values on the other end. – Hartz 20/7, 2011 at 11:14

Quamis sugestion of using a hash seems peculiar - but trying a different symmetric algortihm on the same dataset seems like a good diagnostic approach. – Anneal 20/7, 2011 at 11:52

GregS says that the difference is at byte 20. Have you tried other plaintexts with the same value for byte 20 (and possibly the two bytes surrounding it)? Try comparing the byte values of the plaintexts in PHP and Java. – Jam 20/7, 2011 at 11:57

Your PHP code says the cleartext is padded with NULL, but in Java you use PKCS5Padding. – Vibrate 8/8, 2011 at 18:56

@Vibrate how to correct that? – Hartz 11/8, 2011 at 18:32

I don't know PHP to tell you how to implement it, but you can manually pad the data in Java with 0's before encrypting. – Vibrate 12/8, 2011 at 15:42

Let's have a look at your Java code:

String c = new String(Test.encrypt((new String("thevalue")).getBytes(),
                                   (new String("mykey")).getBytes()));
...
System.out.println("Base64 encoded String:" +
                   new sun.misc.BASE64Encoder().encode(c.getBytes()));

What you are doing here is:

Convert the plaintext string to bytes, using the system's default encoding
convert the key to bytes, using the system's default encoding
encrypt the bytes
convert the encrypted bytes back to a string, using the system's default encoding
convert the encrypted string back to bytes, using the system's default encoding
encode these encrypted bytes using Base64.

The problem is in step 4. It assumes that an arbitrary byte array represents a string in your system's default encoding, and encoding this string back gives the same byte[]. This is valid for some encodings (the ISO-8859 series, for example), but not for others. In Java, when some byte (or byte sequence) is not representable in the given encoding, it will be replaced by some other character, which later for reconverting will be mapped to byte 63 (ASCII ?). Actually, the documentation even says:

The behavior of this constructor when the given bytes are not valid in the default charset is unspecified.

In your case, there is no reason to do this at all - simply use the bytes which your encrypt method outputs directly to convert them to Base64.

byte[] encrypted = Test.encrypt("thevalue".getBytes(),
                                "mykey".getBytes());
System.out.println("Base64 encoded String:"+ new sun.misc.BASE64Encoder().encode(encrypted));

(Also note that I removed the superfluous new String("...") constructor calls here, though this does not relate to your problem.)

The point to remember: Never ever convert an arbitrary byte[], which did not come from encoding a string, to a string. Output of an encryption algorithm (and most other cryptographic algorithms, except decryption) certainly belongs to the category of data which should not be converted to a string.

And never ever use the System's default encoding, if you want portable programs.

Guenevere answered 20/8, 2011 at 16:37 Comment(0)

Your code seems right to me.

It looks like you have a trailing white space in the input to one of these programs, and it is only one. I'll tell you why:

Each of these 4-char blocks represent 3 characters in the encrypted string. Th different part (JA and D8 in the 7th block) actually come from a single different character.

wTHz xfxL HdMm /JMF noh0 hciS /JAD vFFg

wTHz xfxL HdMm /JMF noh0 hciS /D8D vFFg

If I have got it right your email address is 19 characters long. The 20th character in one of your input strings is a white space.

Heathheathberry answered 21/7, 2011 at 3:10 Comment(5)

The observation that only one output byte is wrong is good (and helped be to write my answer), but this does not transfer to one wrong input byte - Blowfish is a Block cipher, and it is used in CBC mode, e.g. one differing bit in the input will get a complete different 64-bit block (8 bytes) in the output. (For counter mode your observation would be right). – Cytochemistry 20/8, 2011 at 18:12

The point with CBC is that once you have something different, the rest of the result cypher text will be different. But in the same block (of say 8 bytes) first bytes aren't necessarily affected by the later bytes in the same block being different. – Heathheathberry 20/8, 2011 at 23:17

The important part is that a block cipher (if it is a good one) is a pseudorandom permutation ... which means that changing one bit in the input changes (in average) half the bits in the output. CBC applies the block cipher on the plain text (together with the cipher text of the previous block), so we would have a totally non-recognizable block. The following blocks would be destroyed, too, of course. For comparison: ECB would only "destroy" one block, while CTR mode would only swap this bit in the output (since the block cipher is not applied on the plaintext). – Cytochemistry 20/8, 2011 at 23:27

CFB would swap one bit and destroy the following blocks. OFB is like CTR here. – Cytochemistry 20/8, 2011 at 23:28

Thanks Paŭlo for the explanation. – Heathheathberry 21/8, 2011 at 13:28

Question: Have you tried the associated PHP decryption library to decrypt the PHP generated encrypted text? Have you tried the associated JAVA decryption library to decrypt the JAVA encrypted text?

If both produce differing outputs, then one MUST fail decrypting.

Is that one PHP, or Java?

Whichever one it is -- I would try to duplicate another such failure with a publicly shareable string... give that string as a unit test -- to the developer or developers that created the encrypt/decrypt code in the language that the round-trip encrypt/decrypt fails in.

Then... wait for them to fix it.

Not sure of any faster solutions -- except maybe change encryption/decryption library providers... or roll your own...

Ecthyma answered 12/8, 2011 at 2:6 Comment(1)

I have tried those and they seam to work fine. This problem happens only in 1 times for 1000 values and at random, but all the time for those values. – Hartz 12/8, 2011 at 6:20

Recommended topics

Hot tags