String interning?
Asked Answered
T

6

9

The second ReferenceEquals call returns false. Why isn't the string in s4 interned? (I don't care about the advantages of StringBuilder over string concatenation.)

string s1 = "tom";
string s2 = "tom";


Console.Write(object.ReferenceEquals(s2, s1)); //true

string s3 = "tom";
string s4 = "to";
s4 += "m";

Console.Write(object.ReferenceEquals(s3, s4)); //false

When I do String.Intern(s4);, I still get false.

Here, both s3 and s4 are interned but their references are not equal?

string s3 = "tom";
string s4 = "to";
s4 += "m";
String.Intern(s4);

Console.WriteLine(s3 == s4); //true
Console.WriteLine(object.ReferenceEquals(s3, s4)); //false
Console.WriteLine(string.IsInterned(s3) != null);  //true (s3 is interned)
Console.WriteLine(string.IsInterned(s4) != null);  //true (s4 is interned)
Terminable answered 24/4, 2010 at 23:40 Comment(4)
Please, verify one more time with s4 = String.Intern (s4); Console.Write (object.ReferenceEquals (s3, s4)); It returns true for .NET 2.0,3.0,3.5,4.0. Moreover, if you test s3 = String.Intern (s3); Console.Write (object.ReferenceEquals (s3, s1)); you can see that s3 = String.Intern (s3); do nothing because like correct wrote Scott Dorman, all from s1 till s3 are already interred and only s4 point to an unique heap pointer before we change it with s4 = String.Intern (s4);Hendrick
string.Interned() doesn't mean the string object passed in was created as an interned string, it means that there's one in the interned store than has the same value. Confusing, huh!Enoch
Makes sense. But String.Intern(s4) does not intern the string then?Terminable
Yes, it does intern the string but you still aren't comparing the interned reference. Look at the update to my answer for more information. From MSDN: The Intern method uses the intern pool to search for a string equal to the value of str. If such a string exists, its reference in the intern pool is returned. If the string does not exist, a reference to str is added to the intern pool, then that reference is returned.Caphaitien
C
18

The string in s4 is interned. However, when you execute s4 += "m";, you have created a new string that will not be interned as its value is not a string literal but the result of a string concatenation operation. As a result, s3 and s4 are two different string instances in two different memory locations.

For more information on string interning, look here, specifically at the last example. When you do String.Intern(s4), you are indeed interning the string, but you are still not performing a reference equality test between those two interned strings. The String.Intern method returns the interned string, so you would need to do this:

string s1 = "tom";
string s2 = "tom";

Console.Write(object.ReferenceEquals(s2, s1)); //true 

string s3 = "tom";
string s4 = "to";
s4 += "m";

Console.Write(object.ReferenceEquals(s3, s4)); //false

string s5 = String.Intern(s4);

Console.Write(object.ReferenceEquals(s3, s5)); //true
Caphaitien answered 24/4, 2010 at 23:44 Comment(2)
Marked as answer thanks. Still weird stuff. I tell it to intern s4 and it returns a reference to an already interned string in the pool. While poor s4 simply hangs out as a non interned string in the heap.Terminable
Thanks. It returns a reference, but if the string value passed as the argument isn't already interned, it will be interned and then the reference is returned. String interning is really an optimization technique predominately used by the compiler to reduce the number of string instances across the application. Unless you are creating a lot of string you probably won't see much benefit doing it yourself.Caphaitien
P
3

Strings are immutable. This means their contents can't be changed.

When you do s4 += "m"; internally, the CLR copies the string to another location in memory which contains the original string and the appended part.

See MSDN string reference.

Palmira answered 24/4, 2010 at 23:43 Comment(3)
I understand strings are immutable. But the whole point of interning is to save memory right? Why can't the CLR say, hey I have this same value in my intern pool, I am just going to point to it.Terminable
@rkrauter: it's quite expensive to check all the strings in the intern pool whether any of them is equal to the result of the operation -- after each of operations! So the CLR sacrifices the memory efficiency to the execution speed. The string calculation at the compile time may be slow, so its results can be interned. The calculations at runtime must be fast, so checking each result against a series of other strings seems to be impracticable.Lindemann
So string interning is primarily done at compile time? Just noticed that when I do String.Intern(s4);, I still get false. Please explain.Terminable
H
2

First of all, everything written so far about immutable strings is correct. But there are some important things which are not written. The code

string s1 = "tom";
string s2 = "tom";
Console.Write(object.ReferenceEquals(s2, s1)); //true

display really "True", but only because of some small compiler optimization or like here because CLR ignore C# compiler attributes (see "CLR via C#" book) and place only one string "tom" in the heap.

Second you can fix the situation with following lines:

s3 = String.Intern(s3);
s4 = String.Intern(s4);
Console.Write (object.ReferenceEquals (s3, s4)); //true

Function String.Intern calculates a hash code of the string and search for the same hash in the internal hash table. Because it find this, it returns back the reference to already existing String object. If the string doesn't exist in the internal hash table, a copy of the string is made and the hash computed. The garbage collector doesn't free memory for the string, because it is referenced by the hash table.

Hendrick answered 25/4, 2010 at 0:16 Comment(0)
W
2

Source: https://blogs.msdn.microsoft.com/ericlippert/2009/09/28/string-interning-and-string-empty/

String interning is an optimization technique by the compiler. If you have two identical string literals in one compilation unit then the code generated ensures that there is only one string object created for all the instance of that literal(characters enclosed in double quotes) within the assembly.

I am from C# background, so i can explain by giving a example from that:

object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;

output of the following comparisons:

Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true    
Console.WriteLine(obj == str2); // false !?

Note1:Objects are compared by reference.

Note2:typeof(int).Name is evaluated by reflection method so it does not gets evaluated at compile time. Here these comparisons are made at compile time.

Analysis of the Results: 1) true because they both contain same literal and so the code generated will have only one object referencing "Int32". See Note 1.

2) true because the content of both the value is checked which is same.

3) FALSE because str2 and obj does not have the same literal. See Note 2.

Wreckful answered 24/9, 2017 at 5:3 Comment(0)
S
1

In C#, each string is a distinct object, and cannot be edited. You are creating references to them, but each string is distinct. The behaviour is consistent and easy to understand.

Might I suggest examining the StringBuilder class for manipulating strings without creating new instances? It should be sufficient for anything you want to do with strings.

Spacetime answered 24/4, 2010 at 23:44 Comment(1)
Only use a StringBuilder if you need to concat large strings together. In all other cases the time saved is so minimal it doesn't matter. Also, strings are put into an array of already created strings internally, this means, if you create the string "Hello", any further string "Hello" will point to the same reference in memory.Decode
T
0

When comparing two objects, not strings, the string equality operator is not called since it is static method without polymorphism.

Here is a test:

static void Test()
{
  object o1 = "a";
  object o2 = new string("a".ToCharArray());

  string o3 = "a";
  string o4 = new string("a".ToCharArray());

  object o5 = "a"; // Compiler optimization addr(o5) = addr(o6)
  object o6 = "a";

  string o7 = "a"; // Compiler optimization addr(o7) = addr(o8)
  string o8 = "a";

  Console.WriteLine("Enter same text 4 times:");

  object o9 = Console.ReadLine();
  object o10 = Console.ReadLine();

  string o11 = Console.ReadLine();
  string o12 = Console.ReadLine();

  Console.WriteLine("object arr   o1  == o2  ? " + ( o1 == o2 ).ToString());
  Console.WriteLine("string arr   o3  == o4  ? " + ( o3 == o4 ).ToString());
  Console.WriteLine("object const o5  == o6  ? " + ( o5 == o6 ).ToString());
  Console.WriteLine("string const o7  == o8  ? " + ( o7 == o8 ).ToString());
  Console.WriteLine("object cnsl  o9  == o10 ? " + ( o9 == o10 ).ToString());
  Console.WriteLine("string cnsl  o11 == o12 ? " + ( o11 == o12 ).ToString());
  Console.WriteLine("o1.Equals(o2) ? " + o1.Equals(o2).ToString());
  Console.WriteLine("o3.Equals(o4) ? " + o3.Equals(o4).ToString());
  Console.WriteLine("o5.Equals(o6) ? " + o5.Equals(o6).ToString());
  Console.WriteLine("o7.Equals(o8) ? " + o7.Equals(o8).ToString());
  Console.WriteLine("o9.Equals(o10) ? " + o9.Equals(o11).ToString());
  Console.WriteLine("o11.Equals(o12) ? " + o11.Equals(o12).ToString());
}

Results:

object arr   o1  == o2  ? False
string arr   o3  == o4  ? True
object const o5  == o6  ? True
string const o7  == o8  ? True
object cnsl  o9  == o10 ? False
string cnsl  o11 == o12 ? True
o1.Equals(o2) ? True
o3.Equals(o4) ? True
o5.Equals(o6) ? True
o7.Equals(o8) ? True
o9.Equals(o10) ? True
o11.Equals(o12) ? True

https://referencesource.microsoft.com/#mscorlib/system/string.cs

Thereon answered 7/10, 2019 at 16:21 Comment(1)
@CodeCaster Ok, thank you for helping me improve myself.Thereon

© 2022 - 2024 — McMap. All rights reserved.