c# string interning
Asked Answered
S

4

11

I am trying to understand string interning and why is doesn't seem to work in my example. The point of the example is to show Example 1 uses less (a lot less memory) as it should only have 10 strings in memory. However, in the code below both example use roughly the same amount of memory (virtual size and working set).

Please advice why example 1 isn't using a lot less memory? Thanks

Example 1:

        IList<string> list = new List<string>(10000);

        for (int i = 0; i < 10000; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(string.Intern(k.ToString()));
            }

        }

        Console.WriteLine("intern Done");
        Console.ReadLine();

Example 2:

        IList<string> list = new List<string>(10000);

        for (int i = 0; i < 10000; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(k.ToString());
            }

        }

        Console.WriteLine("intern Done");
        Console.ReadLine();
Soche answered 24/3, 2010 at 9:43 Comment(1)
Isn't this question a lot like the one you asked yesterday? #2503022Vincentia
S
2

From the msdn Second, to intern a string, you must first create the string. The memory used by the String object must still be allocated, even though the memory will eventually be garbage collected.

Sagittarius answered 24/3, 2010 at 9:56 Comment(1)
In that case, what I'm gaining? as far as I can see, there is almost no benefit of interning non-leterial strings that get created at runtime?Trend
B
17

The problem is that ToString() will still allocate a new string, and then intern it. If the garbage collector doesn't run to collect those "temporary" strings, then the memory usage will be the same.

Also, the length of your strings are pretty short. 10,000 strings that are mostly only one character long is a memory difference of about 20KB which you're probably not going to notice. Try using longer strings (or a lot more of them) and doing a garbage collect before you check the memory usage.

Here is an example that does show a difference:

class Program
{
    static void Main(string[] args)
    {
        int n = 100000;

        if (args[0] == "1")
            WithIntern(n);
        else
            WithoutIntern(n);
    }

    static void WithIntern(int n)
    {
        var list = new List<string>(n);

        for (int i = 0; i < n; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(string.Intern(new string('x', k * 1000)));
            }
        }

        GC.Collect();
        Console.WriteLine("Done.");
        Console.ReadLine();
    }

    static void WithoutIntern(int n)
    {
        var list = new List<string>(n);

        for (int i = 0; i < n; i++)
        {
            for (int k = 0; k < 10; k++)
            {
                list.Add(new string('x', k * 1000));
            }
        }

        GC.Collect();
        Console.WriteLine("Done.");
        Console.ReadLine();
    }
}
Bellybutton answered 24/3, 2010 at 9:51 Comment(2)
Typical micro-optimization that simply does not show what it is supposed to do.Thorlie
I've edited my answer to show an example that does show a difference. The WithIntern uses about 14MB of memory (according to Task Manager). The second one gives an OutOfMemoryException after a second or so. You're simply not going to see any difference unless you have tens or hundreds of megabytes of strings allocated.Bellybutton
E
7

Remember, the CLR manages memory on behalf of your process, so it is really hard to figure out the managed memory footprint from looking at virtual size and working set. The CLR will generally allocate and free memory in chunks. The size of these varies according to implementation details, but due to this it is next to impossible to measure managed heap usage based on memory counters for the process.

However, if you look at the actual memory usage for the examples you'll see a difference.

Example 1

0:005>!dumpheap -stat
...
00b6911c      137         4500 System.String
0016be60        8       480188      Free
00b684c4       14       649184 System.Object[]
Total 316 objects
0:005> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x01592dcc
generation 1 starts at 0x01592dc0
generation 2 starts at 0x01591000
ephemeral segment allocation context: none
 segment    begin allocated     size
01590000 01591000  01594dd8 0x00003dd8(15832)
Large object heap starts at 0x02591000
 segment    begin allocated     size
02590000 02591000  026a49a0 0x001139a0(1128864)
Total Size  0x117778(1144696)
------------------------------
GC Heap Size  0x117778(1144696)

Example 2

0:006> !dumpheap -stat
...
00b684c4       14       649184 System.Object[]
00b6911c   100137      2004500 System.String
Total 100350 objects
0:006> !eeheap -gc
Number of GC Heaps: 1
generation 0 starts at 0x0179967c
generation 1 starts at 0x01791038
generation 2 starts at 0x01591000
ephemeral segment allocation context: none
 segment    begin allocated     size
01590000 01591000  0179b688 0x0020a688(2139784)
Large object heap starts at 0x02591000
 segment    begin allocated     size
02590000 02591000  026a49a0 0x001139a0(1128864)
Total Size  0x31e028(3268648)
------------------------------
GC Heap Size  0x31e028(3268648)

As you can see from the output above the second example does use more memory on the managed heap.

Ealing answered 24/3, 2010 at 10:8 Comment(0)
S
2

From the msdn Second, to intern a string, you must first create the string. The memory used by the String object must still be allocated, even though the memory will eventually be garbage collected.

Sagittarius answered 24/3, 2010 at 9:56 Comment(1)
In that case, what I'm gaining? as far as I can see, there is almost no benefit of interning non-leterial strings that get created at runtime?Trend
R
0

Source: https://blogs.msdn.microsoft.com/ericlippert/2009/09/28/string-interning-and-string-empty/

String interning is an optimization technique by the compiler. If you have two identical string literals in one compilation unit then the code generated ensures that there is only one string object created for all the instance of that literal(characters enclosed in double quotes) within the assembly.

Example:

object obj = "Int32";
string str1 = "Int32";
string str2 = typeof(int).Name;

output of the following comparisons:

Console.WriteLine(obj == str1); // true
Console.WriteLine(str1 == str2); // true    
Console.WriteLine(obj == str2); // false !?

Note1: Objects are compared by reference.

Note2: typeof(int).Name is evaluated by reflection method so it does not gets evaluated at compile time. Here these comparisons are made at compile time.

Analysis of the Results:

  1. true because they both contain same literal and so the code generated will have only one object referencing "Int32". See Note 1.

  2. true because the content of both the value is checked which is same.

  3. false because str2 and obj does not have the same literal. See Note 2.

Rosenda answered 24/9, 2017 at 5:13 Comment(1)
Please don't post the same answer to more than one question. If the questions are basically the same, flag them as duplicates. Otherwise, customize the answer to the question.Marsden

© 2022 - 2024 — McMap. All rights reserved.