How does this Java code snippet work? (String pool and reflection) [duplicate]
Asked Answered
K

7

87

Java string pool coupled with reflection can produce some unimaginable result in Java:

import java.lang.reflect.Field;

class MessingWithString {
    public static void main (String[] args) {
        String str = "Mario";
        toLuigi(str);
        System.out.println(str + " " + "Mario");
    }

    public static void toLuigi(String original) {
        try {
            Field stringValue = String.class.getDeclaredField("value");
            stringValue.setAccessible(true);
            stringValue.set(original, "Luigi".toCharArray());
        } catch (Exception ex) {
            // Ignore exceptions
        }
    }
}

Above code will print:

"Luigi Luigi" 

What happened to Mario?

Kleenex answered 17/9, 2015 at 6:29 Comment(2)
@Joe I'd say let it pass. Jeff Atwood: "I have learned to stop worrying and love (some) duplication. And you should too."Amortization
@Mindwin: It doesn't mean we should stop closing questions as duplicates if they really are so. In fact, Jeff's article encourages us to close questions as duplicates – because that's the way to link them.Maximin
O
98

What happened to Mario ??

You changed it, basically. Yes, with reflection you can violate the immutability of strings... and due to string interning, that means any use of "Mario" (other than in a larger string constant expression, which would have been resolved at compile-time) will end up as "Luigi" in the rest of the program.

This kinds of thing is why reflection requires security permissions...

Note that the expression str + " " + "Mario" does not perform any compile-time concatenation, due to the left-associativity of +. It's effectively (str + " ") + "Mario", which is why you still see Luigi Luigi. If you change the code to:

System.out.println(str + (" " + "Mario"));

... then you'll see Luigi Mario as the compiler will have interned " Mario" to a different string to "Mario".

Okeechobee answered 17/9, 2015 at 6:34 Comment(5)
The "other than in a larger string constant expression" bit may not be 100% true, all of the time. In the question, the System.out.println call uses a compile-time constant expression (" " + "Mario"), yet that instance of "Mario" still ends up changed. I suspect this is due to an optimization whereby " Mario" is interned and "Mario" refers to the same memory space due to being a suffix match, though I haven't confirmed it. An interesting edge case to what is a generally true statement, though. (Or I'm just misinterpreting whether this is a compile-time constant.)Philippians
@ChrisHayes: No, that isn't a compile-time constant expression due to the associativity of +. It's evaluated as (str + " ") + "Mario". If you print just " " + Mario or " " + "Mario" + str then you have compile-time concatenation, and you still get Mario in the output.Okeechobee
Ah, I see. That makes sense, if it's not immediately intuitive. Thanks for the explanation.Philippians
@ChrisHayes: Have added more explanation in the answer, given that it's generally useful.Okeechobee
Yeah, if you declare the str variable final, it’ll change the outcome radically. Note that in theory, it is possible that the manipulated "Mario" instance gets garbage collected after the first concatenation has been performed and a new canonical "Mario" gets created for the next occurrence of that string. But it’s very unlikely. It’s also possible that the string deduplication feature of recent JVM implementations cause the wrong array to get copied to other, non-interned "Mario" instances. Also, there might remain a cached hashcode reflecting the old contents causing funny effects…Parallelogram
M
24

It was set to Luigi. Strings in Java are immutable; thus, the compiler can interpret all mentions of "Mario" as references to the same String constant pool item (roughly, "memory location"). You used reflection to change that item; so all "Mario" in your code are now as if you wrote "Luigi".

Mush answered 17/9, 2015 at 6:35 Comment(3)
"... as references to the same memory location ..." - The compiler doesn't deal in memory locations, and the runtime system can't do that doesn't because the memory location of any String can be changed at any time by the garbage collector. (I understand what you are trying to say ... but you are expressing it incorrectly. If you were talking about C or C++, this explanation is roughly correct. For Java it isn't.)Laboratory
@StephenC: While it would have been better to say "same index in the String constant pool", in the end the effect is identical: "Mario" is stored in a memory location (because even JVM needs to eventually be interpreted on the underlying architecture, where it will be allocated somewhere), and if gc moves it, it still remains true that all mentions of "Mario" will refer to the same (moved) location. Still, you have a point - I should use Java-appropriate jargon, so I'll change it.Mush
The best way to say is to say that they are all the same object. And it is ultimately the Java runtime system that ensures this not the compiler.Laboratory
C
16

To explain the existing answers a bit more, let's take a look at your generated byte code (Only the main() method here).

Byte Code

Now, any changes to the content's of that location will affect both the references (And any other you give too).

Crichton answered 17/9, 2015 at 6:41 Comment(0)
P
9

String literals are stored in the string pool and their canonical value is used. Both "Mario" literals aren't just strings with the same value, they are the same object. Manipulating one of them (using reflection) will modify "both" of them, as they are just two references to the same object.

Pretended answered 17/9, 2015 at 6:35 Comment(0)
C
8

You just changed the String of String constant pool Mario to Luigi which was referenced by multiple Strings, so every referencing literal Mario is now Luigi.

Field stringValue = String.class.getDeclaredField("value");

You have fetched the char[] named value field from class String

stringValue.setAccessible(true);

Make it accessible.

stringValue.set(original, "Luigi".toCharArray());

You changed original String field to Luigi. But original is Mario the String literal and literal belongs to the String pool and all are interned. Which means all the literals which has same content refers to the same memory address.

String a = "Mario";//Created in String pool
String b = "Mario";//Refers to the same Mario of String pool
a == b//TRUE
//You changed 'a' to Luigi and 'b' don't know that
//'a' has been internally changed and 
//'b' still refers to the same address.

Basically you have changed the Mario of String pool which got reflected in all the referencing fields. If you create String Object (i.e. new String("Mario")) instead of literal you will not face this behavior because than you will have two different Marios .

Connell answered 17/9, 2015 at 6:50 Comment(0)
F
5

The other answers adequately explain what's going on. I just wanted to add the point that this only works if there is no security manager installed. When running code from the command line by default there is not, and you can do things like this. However in an environment where trusted code is mixed with untrusted code, such as an application server in a production environment or an applet sandbox in a browser, there would typically be a security manager present and you would not be allowed these kinds of shenanigans, so this is less of a terrible security hole as it seems.

Flasher answered 17/9, 2015 at 10:33 Comment(0)
F
3

Another related point: you can make use of the constant pool to improve the performance of string comparisons in some circumstances, by using the String.intern() method.

That method returns the instance of String with the same contents as the String on which it is invoked from the String constants pool, adding it it if is not yet present. In other words, after using intern(), all Strings with the same contents are guaranteed to be the same String instance as each other and as any String constants with those contents, meaning you can then use the equals operator (==) on them.

This is just an example which is not very useful on its own, but it illustrates the point:

class Key {
    Key(String keyComponent) {
        this.keyComponent = keyComponent.intern();
    }

    public boolean equals(Object o) {
        // String comparison using the equals operator allowed due to the
        // intern() in the constructor, which guarantees that all values
        // of keyComponent with the same content will refer to the same
        // instance of String:
        return (o instanceof Key) && (keyComponent == ((Key) o).keyComponent);
    }

    public int hashCode() {
        return keyComponent.hashCode();
    }

    boolean isSpecialCase() {
        // String comparison using equals operator valid due to use of
        // intern() in constructor, which guarantees that any keyComponent
        // with the same contents as the SPECIAL_CASE constant will
        // refer to the same instance of String:
        return keyComponent == SPECIAL_CASE;
    }

    private final String keyComponent;

    private static final String SPECIAL_CASE = "SpecialCase";
}

This little trick isn't worth designing your code around, but it is worth keeping in mind for the day when you notice a little more speed could be eked out of some bit of performance sensitive code by using the == operator on a string with judicious use of intern().

Flasher answered 18/9, 2015 at 12:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.