String interning in .NET Framework - What are the benefits and when to use interning
Asked Answered
E

6

58

I want to know the process and internals of string interning specific to .NET Framework. Would also like to know the benefits of using interning and the scenarios/situations where we should use string interning to improve the performance. Though I have studied interning from the Jeffery Richter's CLR book but I am still confused and would like to know it in more detail.

[Editing] to ask a specific question with a sample code as below:

private void MethodA()
{
    string s = "String"; // line 1 - interned literal as explained in the answer        

    //s.intern(); // line 2 - what would happen in line 3 if we uncomment this line, will it make any difference?
}

private bool MethodB(string compareThis)
{
    if (compareThis == "String") // line 3 - will this line use interning (with and without uncommenting line 2 above)?
    {
        return true;
    }
    return false;
}
Effluent answered 8/11, 2011 at 17:19 Comment(0)
C
27

Interning is an internal implementation detail. Unlike boxing, I do not think there is any benefit in knowing more than what you have read in Richter's book.

Micro-optimisation benefits of interning strings manually are minimal hence is generally not recommended.

This probably describes it:

class Program
{
    const string SomeString = "Some String"; // gets interned

    static void Main(string[] args)
    {
        var s1 = SomeString; // use interned string
        var s2 = SomeString; // use interned string
        var s = "String";
        var s3 = "Some " + s; // no interning 

        Console.WriteLine(s1 == s2); // uses interning comparison
        Console.WriteLine(s1 == s3); // do NOT use interning comparison
    }
}
Cordula answered 8/11, 2011 at 17:22 Comment(5)
Just FYI - Your "no interning" line is going to still use two interned strings to generate the non-interned string. Also, string's comparisons always use the same comparison (there is no "interning comparison" or "other comparison") - but there's a short circuit that detects if the members point to the same instance.Mention
Yes, constants and literals get interned. CheersCordula
@Cordula - So for understanding, after the 'no interning' line; if we want to intern the s3 variable we would need to use s3.intern() and then the s1 == s3 comparison would use interning comparison - right?Effluent
Being blind to implementation details is a bad thing. Consider that many people are currently using work-arounds due to the perceived lack of string interning. Knowing that it exists and where it can improve the performance of you code might actually allow you to remove 'micro-optimisations' which are already in place, ones which trade performance for readability. Edit: I suppose there are two schools of thought regarding implementation details but many would argue that a good programmer's knowledge goes as far down the stack as possible, and especially to the idiosyncrasies of the compilerIntrigant
if you put to the mix compilers from C# to other platforms/languages, it's better to now assume any internal behaviourCue
M
48

In general, interning is something that just happens, automatically, when you use literal string values. Interning provides the benefit of only having one copy of the literal in memory, no matter how often it's used.

That being said, it's rare that there is a reason to intern your own strings that are generated at runtime, or ever even think about string interning for normal development.

There are potentially some benefits if you're going to be doing a lot of work with comparisons of potentially identical runtime generated strings (as interning can speed up comparisons via ReferenceEquals). However, this is a highly specialized usage, and would require a fair amount of profiling and testing, and wouldn't be an optimization I'd consider unless there was a measured problem in place.

Mention answered 8/11, 2011 at 17:23 Comment(2)
@Vijay: Calling intern on that string will have no effect - it's already an interned string (since it's assigned to a literal). The literal in MethodB will also be an interned string (all literal strings are interned automatically).Mention
You have missed another important use case discussed in some other answers. If you are storing a truly gigantic amount of data that has many of the same string, there can be a large memory savings. This was a lifesaver for me when I needed to load and keep in memory very large (multiple gigabyte) data files containing many repeated strings.Penance
T
30

This is an "old" question, but I have a different angle on it.

If you're going to have a lot of long-lived strings from a small pool, interning can improve memory efficiency.

In my case, I was interning another type of object in a static dictionary because they were reused frequently, and this served as a fast cache before persisting them to disk.

Most of the fields in these objects are strings, and the pool of values is fairly small (much smaller than the number of instances, anyway).

If these were transient objects, it wouldn't matter because the string fields would be garbage collected often. But because references to them were being held, their memory usage started to accumulate (even when no new unique values were being added).

So interning the objects reduced the memory usage substantially, and so did interning their string values while they were being interned.

Tribal answered 13/9, 2013 at 6:25 Comment(0)
C
27

Interning is an internal implementation detail. Unlike boxing, I do not think there is any benefit in knowing more than what you have read in Richter's book.

Micro-optimisation benefits of interning strings manually are minimal hence is generally not recommended.

This probably describes it:

class Program
{
    const string SomeString = "Some String"; // gets interned

    static void Main(string[] args)
    {
        var s1 = SomeString; // use interned string
        var s2 = SomeString; // use interned string
        var s = "String";
        var s3 = "Some " + s; // no interning 

        Console.WriteLine(s1 == s2); // uses interning comparison
        Console.WriteLine(s1 == s3); // do NOT use interning comparison
    }
}
Cordula answered 8/11, 2011 at 17:22 Comment(5)
Just FYI - Your "no interning" line is going to still use two interned strings to generate the non-interned string. Also, string's comparisons always use the same comparison (there is no "interning comparison" or "other comparison") - but there's a short circuit that detects if the members point to the same instance.Mention
Yes, constants and literals get interned. CheersCordula
@Cordula - So for understanding, after the 'no interning' line; if we want to intern the s3 variable we would need to use s3.intern() and then the s1 == s3 comparison would use interning comparison - right?Effluent
Being blind to implementation details is a bad thing. Consider that many people are currently using work-arounds due to the perceived lack of string interning. Knowing that it exists and where it can improve the performance of you code might actually allow you to remove 'micro-optimisations' which are already in place, ones which trade performance for readability. Edit: I suppose there are two schools of thought regarding implementation details but many would argue that a good programmer's knowledge goes as far down the stack as possible, and especially to the idiosyncrasies of the compilerIntrigant
if you put to the mix compilers from C# to other platforms/languages, it's better to now assume any internal behaviourCue
C
22

Interned strings have the following characteristics:

  • Two interned strings that are identical will have the same address in memory.
  • Memory occupied by interned strings is not freed until your application terminates.
  • Interning a string involves calculating a hash and looking it up in a dictionary which consumes CPU cycles.
  • If multiple threads intern strings at the same time they will block each other because accesses to the dictionary of interned strings are serialized.

The consequences of these characteristics are:

  • You can test two interned strings for equality by just comparing the address pointer which is a lot faster than comparing each character in the string. This is especially true if the strings are very long and start with the same characters. You can compare interned strings with the Object.ReferenceEquals method, but it is safer to use the string == operator because it checks to see if the strings are interned first.

  • If you use the same string many times in your application, your application will only store one copy of the string in memory reducing the memory required to run your application.

  • If you intern many different strings this will allocate memory for those strings that will never be freed, and your application will consume ever increasing amounts of memory.

  • If you have a very large number of interned strings, string interning can become slow, and threads will block each other when accessing the interned string dictionary.

You should use string interning only if:

  1. The set of strings you are interning is fairly small.
  2. You compare these strings many times for each time that you intern them.
  3. You really care about minute performance optimizations.
  4. You don't have many threads aggressively interning strings.
Carabineer answered 26/6, 2017 at 1:11 Comment(6)
"Two interned strings that are identical will have the same address in memory." That is a tautology ‒ any two identical objects have the same address in memory because it is one and the same object by definition. "Equal" would be a better fit here ‒ two interned strings with the same character data are indeed one and the same, i.e. equality implies identity in that case.Haletky
Not true, two strings that contain the same character sequence but are not interned will be equal, but will not be at the same address in memory, and ReferenceEquals will return false.Carabineer
That is complementary to what I said, and it is true, but I don't see how it makes what I said false. Two non-interned strings that are equal may indeed not have the same address in memory (in which case ReferenceEquals would be false). Two interned strings that are equal have the same address, i.e. they are identical. Where is the contradiction?Haletky
Any two identical objects do not have to have the same address in memory, they can be distinct and identical. Two strings are not the same object unless they are interned, and the distinction is important here. Describing them as "Equal" would be misleading, because two strings with the same character sequence would be "Equal" whether they were interned or not.Carabineer
You are misunderstanding what "identical" means. Identical is the opposite of distinct ‒ if two objects are identical, they have the same identity, i.e. it is in reality one and the same object. Two identical objects must have the same address in memory, because they are indistinguishable by any inherent property. Two strings can also be the same object (and thus identical) without being interned ‒ check out var s1 = 5.ToString(); var s2 = 5.ToString(); Console.WriteLine(String.IsInterned(s1) != null); Console.WriteLine(Object.ReferenceEquals(s1, s2)); ‒ prints False, True.Haletky
Interning is not necessary to obtain strings with the same identity ‒ int and some other types cache the resulting string for small values (try 5000 instead of 5), and there are also types whose sole purpose is to mimic what interning does, such as the XmlNameTable. Identity always implies equality (since identity is the strongest possible equivalence), but the converse does not hold in general ‒ identity and equality are really the same only for specific sources of strings, such as the above.Haletky
H
16

Internalization of strings affects memory consumption.

For example if you read strings and keep them it in a list for caching; and the exact same string occurs 10 times, the string is actually stored only once in memory if string.Intern is used. If not, the string is stored 10 times.

In the example below, the string.Intern variant consumes about 44 MB and the without-version (uncommented) consumes 1195 MB.

static void Main(string[] args)
{
    var list = new List<string>();

    for (int i = 0; i < 5 * 1000 * 1000; i++)
    {
        var s = ReadFromDb();
        list.Add(string.Intern(s));
        //list.Add(s);
    }

    Console.WriteLine(Process.GetCurrentProcess().PrivateMemorySize64 / 1024 / 1024 + " MB");
}

private static string ReadFromDb()
{
    return "abcdefghijklmnopqrstuvyxz0123456789abcdefghijklmnopqrstuvyxz0123456789abcdefghijklmnopqrstuvyxz0123456789" + 1;
}

Internalization also improves performance for equals-compare. The example below the intern version takes about 1 time units while the non-intern takes 7 time units.

static void Main(string[] args)
{
    var a = string.Intern(ReadFromDb());
    var b = string.Intern(ReadFromDb());
    //var a = ReadFromDb();
    //var b = ReadFromDb();

    int equals = 0;
    var stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < 250 * 1000 * 1000; i++)
    {
        if (a == b) equals++;
    }
    stopwatch.Stop();

    Console.WriteLine(stopwatch.Elapsed + ", equals: " + equals);
}
Homomorphism answered 2/3, 2017 at 20:4 Comment(3)
Why are not these strings interned by default by C# optimizer since they are the same?Draper
Interned strings are kept in memory and is not freed until the process is terminated so they carry a cost. Intern only if you will be doing a lot of compares during a larger part of the process life time and only a few number of strings to keep the memory cost down.Homomorphism
String literals are automatically interned by the compiler. Read my answer to understand why the optimizer does not automatically intern all stringsCarabineer
I
-2

From Microsoft's documentation:

Performance Considerations

If you are trying to reduce the total amount of memory your application allocates, keep in mind that interning a string has two unwanted side effects. First, the memory allocated for interned String objects is not likely to be released until the common language runtime (CLR) terminates. The reason is that the CLR's reference to the interned String object can persist after your application, or even your application domain, terminates. Second, to intern a string, you must first create the string. The memory used by the String object must still be allocated, even though the memory will eventually be garbage collected.

The CompilationRelaxations.NoStringInterning enumeration member marks an assembly as not requiring string-literal interning. You can apply NoStringInterning to an assembly using the CompilationRelaxationsAttribute attribute. Also, when you use Ngen.exe (Native Image Generator) to compile an assembly in advance of run time, strings are not interned across modules.

Insensibility answered 28/10, 2023 at 6:7 Comment(2)
If you're going to quote something you need to actually quote it. Otherwise, it's plagiarism.Isabel
Welcome to Stack Overflow! Thank you for your answer. Please provide more details about your solution. Code snippets, high quality descriptions, or any relevant information would be great. Clear and concise answers are more helpful and easier to understand for everyone. Edit your answer with specifics to raise the quality of your answer. For more information: How To: Write good answers. Happy coding!Pyxidium

© 2022 - 2024 — McMap. All rights reserved.