the timing of String Literal loaded into StringTable in Java HotSpot vm
Asked Answered
O

1

3

The Question came out when i was learning java.lang.String Java API.

I found an article in Chinese. Java 中new String("字面量") 中 "字面量" 是何时进入字符串常量池的?

it said,CONSTANT_String is lazy resolve in HotSpot VM, so String Literal is loaded into StringTable util it is used.

And i found some relavant saying.

jvms Chapter 5.4. Linking says

For example, a Java Virtual Machine implementation may choose to resolve each symbolic reference in a class or interface individually when it is used ("lazy" or "late" resolution), or to resolve them all at once when the class is being verified ("eager" or "static" resolution).

I found some openjdk code about ldc

IRT_ENTRY(void, InterpreterRuntime::ldc(JavaThread* thread, bool wide))  
  // access constant pool  
  constantPoolOop pool = method(thread)->constants();  
  int index = wide ? get_index_u2(thread, Bytecodes::_ldc_w) :get_index_u1(thread, Bytecodes::_ldc);  
  constantTag tag = pool->tag_at(index);  

  if (tag.is_unresolved_klass() || tag.is_klass()) {  
    klassOop klass = pool->klass_at(index, CHECK);  
    oop java_class = klass->java_mirror();  
    thread->set_vm_result(java_class);  
  } else {  
#ifdef ASSERT  
    // If we entered this runtime routine, we believed the tag contained  
    // an unresolved string, an unresolved class or a resolved class.  
    // However, another thread could have resolved the unresolved string  
    // or class by the time we go there.  
    assert(tag.is_unresolved_string()|| tag.is_string(), "expected string");  
#endif  
    oop s_oop = pool->string_at(index, CHECK);  
    thread->set_vm_result(s_oop);  
  }  
IRT_END  

and code about pool->string_at(index, CHECK)

oop constantPoolOopDesc::string_at_impl(constantPoolHandle this_oop, int which, TRAPS) {  
  oop str = NULL;  
  CPSlot entry = this_oop->slot_at(which);  
  if (entry.is_metadata()) {  
    ObjectLocker ol(this_oop, THREAD);  
    if (this_oop->tag_at(which).is_unresolved_string()) {  
      // Intern string  
      Symbol* sym = this_oop->unresolved_string_at(which);  
      str = StringTable::intern(sym, CHECK_(constantPoolOop(NULL)));  
      this_oop->string_at_put(which, str);  
   } else {  
      // Another thread beat us and interned string, read string from constant pool  
     str = this_oop->resolved_string_at(which);  
    }  
  } else {  
    str = entry.get_oop();  
  }  
  assert(java_lang_String::is_instance(str), "must be string");  
  return str;  
}  

But

those code only could prove String Literal maybe loaded into StringTable util ldc, but can not prove lazy resolve in HotSpot VM.

Could someone explicate it explicitly.

FYI, i know little c but not c++.

Thanks.!

Orban answered 5/7, 2017 at 11:7 Comment(3)
The only thing you need to know about all this is that it is undetectable by Java code. Whether eager or lazy is strictly an implementation detail that need not concern you in any way, unless you're the implementor.Hallett
From the point of view of your code, literals exist from the moment the type is initialized and even before.Diction
@EJP, Lew Bloch: actually, we can detect the lazy resolving…Alchemize
A
2

There is a corner case which allows to check within a Java application whether a string existed in the pool prior to the test, but it can be done only once per string. Together with string literals of the same content, the lazy loading can be detected:

public class Test {
    public static void main(String[] args) {
        test('h', 'e', 'l', 'l', 'o');
        test('m', 'a', 'i', 'n');
    }
    static void test(char... arg) {
        String s1 = new String(arg), s2 = s1.intern();
        System.out.println('"'+s1+'"'
            +(s1!=s2? " existed": " did not exist")+" in the pool before");
        System.out.println("is the same as \"hello\": "+(s2=="hello"));
        System.out.println("is the same as \"main\": "+(s2=="main"));
        System.out.println();
    }
}

The test first creates a new string instance which does not exist in the pool. Then it calls intern() on it and compares the references. There are three possible scenarios:

  1. If a string of the same contents exists in the pool, that string will be returned which must be a different object than our string not being in the pool.

  2. Our string is added to the pool and returned. In this case, the two references are identical.

  3. A new string with the same contents will be created and added to the pool. Then, the returned reference will be different.

We can’t distinguish between 1 and 3, so if a JVM generally adds new strings to the pool in intern(), we are out of luck. But if it adds the instance we’re calling intern() on, we can identify scenario 2 and know for sure that the string wasn’t in the pool, but has been added as a side effect of our test.

On my machine, it prints:

"hello" did not exist before
is the same as "hello": true
is the same as "main": false

"main" existed before
is the same as "hello": false
is the same as "main": true

Also on Ideone

showing that "hello" did not exist when entering the test method the first time, despite there is a string literal "hello" in the code later-on. So this proves that the string literal is resolved lazily. Since we already added a hello string manually, the string literal with the same contents will resolve to the same instance.

In contrast, the "main" string already exists in the pool, which is easy to explain. The Java launcher searches for the main method to execute, hence, adds that string to the pool as a side effect.

If we swap the order of the tests to test('m', 'a', 'i', 'n'); test('h', 'e', 'l', 'l', 'o'); the "hello" string literal will be used in the first test invocation and remains in the pool, so when we test it in the second invocation the string will already exist.

Alchemize answered 5/7, 2017 at 15:6 Comment(4)
that's probably the only case I can think of where intern would be acceptable. Just recently there was an excellent article about this: shipilev.net/jvm-anatomy-park/10-string-internCourtly
@Eugene: indeed, intern() is good for debugging but nothing else…Alchemize
The question is about string literals. This answer is exclusively concerned with the behaviour of String.intern() at run time, starting with character literals.Hallett
@EJP maybe you didn’t read the code correctly. Inside the method test, there are the string literals "hello" and "main". What the code does, is dealing with intern() to prove that these string literals are not in the pool at the time it does call intern() on strings which are definitely not literals. This is sufficient to prove that the string literal "hello" contained in the test method is not put into the pool at class loading time, as then, they would be in the pool prior to the invocation of the main method. While "main" is already in the pool, due to the launcher.Alchemize

© 2022 - 2024 — McMap. All rights reserved.