String literals, interning and reflection
Asked Answered
R

2

10

I'm trying to find a third solution to this question.

I can't understand why this doesn't print false.

public class MyClass {

    public MyClass() {
        try {
            Field f = String.class.getDeclaredField("value");
            f.setAccessible(true);
            f.set("true", f.get("false"));
        } catch (Exception e) {
        }
    }

    public static void main(String[] args) {
        MyClass m = new MyClass();
        System.out.println(m.equals(m));
    }
}

Surely, because of string interning, the "true" instance being modified is exactly the same one used in the print method of PrintStream?

public void print(boolean b) {
    write(b ? "true" : "false");
}

What am I missing?

Edit

An interesting point by @yshavit is that if you add the line

System.out.println(true);

before the try, the output is

true
false
Robotize answered 1/4, 2016 at 22:20 Comment(21)
I tried it locally, and it works for me as I'd expect. System.out.println("true") prints false.Narton
@Narton locally it works, but I don't understand why it doesn't work globally. I thought string interning meant that all literals "true" should give the same instance. It's late, so I could be missing something really obvious.Robotize
Interesting indeed. I had the same expectation as you.Bailly
@Bailly Thank you, you've put my mind at rest. I always get nervous before I post a question that I'm going to be hit by loads of downvotes for asking a stupid question.Robotize
Huh, that is weird!Narton
And an added twist: if you do System.out.println(true) before you set the field, then it works like we'd expect. Basically, it looks like the writer has its own cache-or-something of the string.Narton
@Narton You mean System.out.println("true")?Jaddan
@Jaddan No, true, the boolean. If I change Paul's main method to be System.out.println(true); new MyClass(); System.out.println(true); then I get true false.Narton
A little off topic rant: I'd love to send people making this questions work on a business application for a couple of days in their life... just to feel the difference.Cerebrate
@Baldurian You can always downvote the question. You have a point.Robotize
@PaulBoddington Nothing against the question, just a personal consideration.Cerebrate
@Narton The point is that you use the two same ways If you print true the first time, then you need to print it the second time. Using true and then "true" doesn't work and it looks like it has something to do when the literal will be changed and when the print class will be called, since this causes the JVM to evaluate if "true" is already in the pool, which isn't the case anymore (according to equals). If you use print(true), then MyClass will alter the literal of that PrintStream class and if you use print("true"), then the literal of your test class.Jaddan
@Jaddan I don't follow you, sorry. I think in either case, "true" is still in the pool; it's just that after the f.set call, its char[] value points to {'f', 'a', 'l', 's', 'e'}Narton
@Narton And how does the JVM know it is in the pool? ;) It can only use hashCode and equals. The first one is the same, the second one isn't.Jaddan
This could be an explanation https://mcmap.net/q/24725/-garbage-collection-of-string-literals/1743880 (not sure, kinda late here also :D)Bailly
@Jaddan I think your'e onto something. Note that I, unlike the OP, am using println(true) both times. The first and third statement in my example are identical, and yet they print out "true" and "false" respectively. If you remove the first call, then the remaining call prints "true". In other words, the side effects of new MyClass() only seem to be affected if you first class-load PrintStream. I think you're right that when the classloader does its work, it uses hashCode/equals.Narton
There is only one open question: why doesn't System.out.println(true); MyClass m = new MyClass(); System.out.println("true"); work ... OPs "hack" only works it the println class gets an equal value (I mean boolean in both times, or the String literal).Jaddan
@Jaddan It works for me; prints "true" then "false".Narton
This makes no sense to me at all. The strange behaviour probably has something to do with you screwing around with constant strings once they've been interned. I suspect that this is the equivalent of being very naughty, and the jvm is being stroppy about it. If you do something truly weird, then you can expect truly weird results...Entreaty
@EngineerDollery That's true. I really should just forget about it, but I love these ludicrous theoretical questions.Robotize
Oh, don't forget about it. It's really interesting -- breaking things to see what's inside. It's how we learn :)Entreaty
M
7

This is arguably a HotSpot JVM bug.

The problem is in the string literal interning mechanism.

  • java.lang.String instances for the string literals are created lazily during constant pool resolution.
  • Initially a string literal is represented in the constant pool by CONSTANT_String_info structure that points to CONSTANT_Utf8_info.
  • Each class has its own constant pool. That is, MyClass and PrintStream have their own pair of CONSTANT_String_info / CONSTANT_Utf8_info cpool entries for the literal 'true'.
  • When CONSTANT_String_info is accessed for the first time, JVM initiates the process of resolution. String interning is the part of this process.
  • To find a match for a literal being interned, JVM compares the contents of CONSTANT_Utf8_info with the contents of string instances in the StringTable.
  • ^^^ And here is the problem. Raw UTF data from cpool is compared to Java char[] array contents that can be spoofed by a user via Reflection.

So, what's happening in your test?

  1. f.set("true", f.get("false")) initiates the resolution of the literal 'true' in MyClass.
  2. JVM discovers no instances in StringTable matching the sequence 'true', and creates a new java.lang.String, which is stored in StringTable.
  3. value of that String from StringTable is replaced via Reflection.
  4. System.out.println(true) initiates the resolution of the literal 'true' in PrintStream class.
  5. JVM compares UTF sequence 'true' with Strings from StringTable, but finds no match, since that String already has 'false' value. Another String for 'true' is created and placed in StringTable.

Why do I think this is a bug?

JLS §3.10.5 and JVMS §5.1 require that string literals containing the same sequence of characters must point to the same instance of java.lang.String.

However, in the following code the resolution of two string literals with the same sequence of characters result in different instances.

public class Test {

    static class Inner {
        static String trueLiteral = "true";
    }

    public static void main(String[] args) throws Exception {
        Field f = String.class.getDeclaredField("value");
        f.setAccessible(true);
        f.set("true", f.get("false"));

        if ("true" == Inner.trueLiteral) {
            System.out.println("OK");
        } else {
            System.out.println("BUG!");
        }
    }
}

A possible fix for JVM is to store a pointer to original UTF sequence in StringTable along with java.lang.String object, so that interning process will not compare cpool data (inaccessible by user) with value arrays (accessible via Reflection).

Magnification answered 2/4, 2016 at 15:55 Comment(2)
I can’t follow the reasoning of your example program. After you patched the object created for the String literal "true" in Test, it contains the character sequence false, in other words, not the same sequence of characters as the String literal inside Test.Inner which contains the character sequence true. So there is no reason to expect them to be the same object instance.Banderole
@Banderole "To derive a string literal, the Java Virtual Machine examines the sequence of code points given by the CONSTANT_String_info structure" (JVMS §5.1). Even if I replace underlying char[] array via reflection, cpool entry is not changed; so if JVM literally follows the spec, the behaviour will be different. Though I don't insist on calling this a bug, because changing internal structure of VM objects is not something that should be considered a normal operation.Magnification
R
1

I've written this as a community wiki as I don't know if it's right and don't understand the details anyway.

What appears to happen is that when a string literal is encountered at runtime, the JVM checks the string pool (using equals) to see if the string is already there. If it isn't there, a new instance is used. This object (either the new one or the one that was already in the string pool) is the one that will be used from now on for all string literals in that class that are the same.

Consider this example:

public class MyClass {

    public MyClass() {
        try {
            Field f = String.class.getDeclaredField("value");
            f.setAccessible(true);
            f.set("true", f.get("false"));
        } catch (Exception e) {
        }
    }

    public static void main(String[] args) {
        System.out.println(true);       // 1
        new MyClass();
        System.out.println(true);       // 2
        System.out.println("true");     // 3
        printTrue();
        OtherClass.printTrue();
    }

    public static void printTrue() {
        System.out.println("true");     // 4
    }
}

public class OtherClass {

    static void printTrue() {
        System.out.println("true");     // 5
    }
}

This prints:

true
false
false
false
true

My explanation:

In line 1, the JVM encounters the literal "true" in the PrintStream class. A new string is added to the pool. Then new MyClass() is invoked. Inside this constructor, the JVM encounters the string literal "true" in the MyClass class. This string is already in the pool, so the instance in the pool is the one that will be used, but crucially it is also the one that will later be used in lines 3 and 4. Then the array backing this string is modified. Lines 2, 3 and 4 therefore all print false. Next, OtherClass.printTrue() is invoked and the JVM encounters the string literal "true" for the first time in OtherClass. This string is not equal to the one in the pool because the one in the pool now has backing array [f, a, l, s, e]. Therefore a new string instance is used and true is printed at line 5.

Now suppose we comment out line 1:

//        System.out.println(true);       // 1

This time the output is:

true
false
false
true

Why does line 2 produce a different result? The difference here is the literal "true" is not encountered in the PrintStream class until after the backing array has been modified. So the "wrong" string is not the one used in the PrintStream class. However, lines 3 and 4 continue to print "false" for the same reason as above.

Robotize answered 1/4, 2016 at 22:20 Comment(7)
I'm not sure this is right. Check the second answer here: https://mcmap.net/q/22374/-what-is-java-string-interningEntreaty
Ah man.. I've been looking at this so long that the word true looks alien to me now :/Whole
@EngineerDollery Please use the "share" link under each post to share/link them. Saying "the second" doesn't quite work since we don't know how you sorted the answers ("active", "votes" or "oldest") :).Jaddan
"but crucially it is also the one that will later be used in lines 3 and 4." but why? My guess is that the JVM remembers that the String constant (from the class file constant pool) was already interned and doesn't check if the String from 3 is already in there. Normally the interning happens when the specific line is reached, but constant pool (the one inside the class file) seem to play an important part here.Jaddan
@Jaddan That sounds very plausible. Later on if I have time I'll see what happens with the various types of nested classes.Robotize
On the JVM level, there are no nested classes. Each class has its own class file with its own constant pool, so don't expect any differences for nested classes. But note that unique items in the class' pool are not mandatory. You could hand-craft a class file having multiple occurrences of the same string constant in the pool and check the behavior for ldc instructions referring to different pool items bearing the same string…Banderole
While the explanation sounds reasonable, I think it might be invalidated by the fact that writing "true".intern(); as the first line of the main method does not cause the program to print false - although the String "true" would then already be in the pool. I dug a bit through the OpenJDK source code, played a bit with stackoverflow.com/questions/22094111 etc, but some internings seem to happen on another path (ConstantPool, hg.openjdk.java.net/jdk8/jdk8/hotspot/file/a5ac0873476c/src/… ), so definite answers are difficult here.Gilead

© 2022 - 2024 — McMap. All rights reserved.