How is this valid Java code? (obfuscated Java)
Asked Answered
C

3

7

This code looks obviously incorrect and yet it happily compiles and runs on my machine. Can someone explain how this works? For example, what makes the ")" after the class name valid? What about the random words strewn around?

class M‮{public static void main(String[]a‭){System.out.print(new char[]{'H','e','l','l','o',' ','W','o','r','l','d','!'});}}

Test online: https://ideone.com/t1W5Vm
Source: https://codegolf.stackexchange.com/a/60561

Cultivated answered 24/4, 2016 at 14:29 Comment(5)
Have ypu tried opening it in a hex editor? There may be some "reverse" characters in there, wich makes letter look mirrored.Raid
Yes, there are zero-width Unicode characters that make this appear malformed. If you try to indent it properly you'll notice the text flow in confusing ways.Gam
Peter Lawrey explained it on his blog if I remember correctly. Let me search for it.Minima
From the source, there is a comment which states "There is a unicode "RIGHT-TO-LEFT OVERRIDE" character just after the M, and the opposite (left to right) just before the a[]"Weymouth
There it is: vanillajava.blogspot.com/2012/09/hidden-code.html also mentioned #12857840Minima
C
9

One way to decipher what is going on is to look at the program character-by-character (demo).

There you may discover that characters in positions 7 and 42 are special UNICODE characters RLO (right-to-left order) and LRO (left-to-right order) characters.

Once you remove them, the program starts to look normal:

class M{public static void main(String[]a){System.out.print(new char[]{'H','e','l','l','o',' ','W','o','r','l','d','!'});}}

The trick to why the obfuscated program compiles is that Java compiler ignores these special characters as a format character.

Caines answered 24/4, 2016 at 14:46 Comment(1)
Ah! That also explains why it kept behaving strangely in my text editor!Cultivated
E
1

This is valid java code, but it uses the arabic "align right" invisible zero-width ubicode characters. Try to place your cursor in the text and press the right arrow. There's ine between "M" and ")", and one "char[]" and "a[]".

I tried to format the code, but it's just frustrating to navigate in it.

Epitasis answered 24/4, 2016 at 14:44 Comment(0)
P
1

You will find two unicode sequences in your source

0xE2 0x80 0xAE http://www.fileformat.info/info/unicode/char/202e/index.htm

0xE2 0x80 0xAD http://www.fileformat.info/info/unicode/char/202d/index.htm

effectively writing the part: {public static void main(String[]a right to left

Provincetown answered 24/4, 2016 at 14:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.