Intern string literals misunderstanding?
Asked Answered
E

2

6

I dont understand :

MSDN says

http://msdn.microsoft.com/en-us/library/system.string.intern.aspx

Consequently, an instance of a literal string with a particular value only exists once in the system.

For example, if you assign the same literal string to several variables, the runtime retrieves the same reference to the literal string from the intern pool and assigns it to each variable.

Does this behavior is the Default (without intern ) ? or by using Intern method ?

  • If its default , so why will I want to use intern? (the instance will be once already...) ?

  • If its NOT default : if I write 1000 times this row :

    Console.WriteLine("lalala");

1 ) will I get 1000 occurrences of "lalala" in memory ? ( without using intern ...)

2) will "lalala" will eventually Gc'ed ?

3) Does "lalala" is already interned ? and if it does , why will i need to "get" it from the pool , and not just write "lalala" again ?

Im a bit confuse.

Eliza answered 1/1, 2012 at 7:49 Comment(0)
A
11

String literals get interned automatically (so, if your code contains "lalala" 1000 times, only one instance will exist).

Such strings will not get GC'd and any time they are referenced the reference will be the interned one.


string.Intern is there for strings that are not literals - say from user input or read from a file or database and that you know will be repeated very often and as such are worth interning for the lifetime of the process.

Adonic answered 1/1, 2012 at 7:54 Comment(8)
so why will i ever want to use string.intern("lalala") if its automatically gets it from the pool ? I can just use "lalala"....please clarify ...:) toda.Eliza
@RoyiNamir - Added some information about string.Intern. Hope that clarifies a bit.Adonic
as always thank you. -the second part clarified things for me.Eliza
If i write string g="aaa"; this is not literl isnt it ? does it interened ? does it Gced?Eliza
@RoyiNamir - "aaa" is a string literal. It will get interned and will not get GC'd.Adonic
@RoyiNamir - A string literal is a string you have in code - literally. A string that is not a literal string is one that gets built up or comes from an external source. For example, in a winforms application, when a user enters their username into a textbox - that is not a string literal within your code - how can it be? string nonLiteral = myText.Text;.Adonic
#2423611 please see the accepted answer.... Literal strings are interned per default, so even if you application no longer references it it will not be collected, as it is referenced by the internal interning structure. Other strings are just like any other managed object. As soon as they are no longer reference by your application they are eligible for garbage collection. so what are those "other" strings ? thanks youEliza
@RoyiNamir - String literals would be the strings that exist in the program when it is compiled.Adonic
C
5

Interning is something that happens behind the scenes, so you as a programmer never have to worry about it. You generally do not have to put anything to the pool, or get anything from the pool. Like garbage collection: you never have to invoke it, or worry that it may happen, or worry that it may not happen. (Well, in 99.999% of the cases. And the remaining 0.001 percent is when you are doing very weird stuff.)

The compiler takes care of interning all string literals that are contained within your source file, so "lalala" will be interned without you having to do anything, or having any control over the matter. And whenever you refer to "lalala" in your program, the compiler makes sure to fetch it from the intern pool, again without you having to do anything, nor having any control over the matter.

The intern pool contains a more-or-less fixed number of strings, generally of a very small size, (only a fraction of the total size of your .exe,) so it does not matter that they never get garbage-collected.


EDIT

The purpose of interning strings is to greatly improve the execution time of certain string operations like Equals(). The Equals() method of String first checks whether the strings are equal by reference, which is extremely fast; if the references are equal, then it returns true immediately; if the references are not equal, and the strings are both interned, then it returns false immediately, because they cannot possibly be equal, since all strings in the intern pool are different from each other. If none of the above holds true, then it proceeds with a character by character string comparison. (Actually, it is even more complicated than that, because it also checks the hashcodes of the strings, but let's keep things simple in this discussion.)

So, suppose that you are reading tokens from a file in string s, and you have a switch statement of the following form:

switch( s )
{
    case "cat": ....
    case "dog": ....
    case "tod": ....
}

The string literals "cat", "dog", "tod" have all been interned, but you are comparing each and every one of them against s, which has not been interned, so you are not reaping the benefits of the intern pool. If you intern s right before the switch statement, then the comparisons that will be done by the switch statement will be a lot faster.

Of course, if there is any possibility that your file might contain garbage, then you do NOT want to do this, because loading lots of random strings into the intern pool is sure to kill the performance of your program, and eventually run out of memory.

Circlet answered 1/1, 2012 at 8:25 Comment(3)
the compiler makes sure to fetch it from the intern pool ... so when WIll i want explicitly use string.intern ??Eliza
@Adonic has already answered this. I will try to also give an example within my answer.Circlet
Thank you very much for the extended example.Eliza

© 2022 - 2024 — McMap. All rights reserved.