Will Interning strings help performance in a parser?
Asked Answered
L

2

6

If you are parsing, lets just say HTML, once you read the element name, will it be beneficial to intern it? The logic here is that this parser will parse the same strings (element names) over and over again? And several documents will be parsed.

Theory:

// elemName is checked for null.
MarkupNode node = new MarkupNode() 
{
   Name = String.IsInterned(elemName) ? elemName : String.Intern(elemName),
   ...
};

This question was motivated by the question string-interning-memory.

Lawley answered 31/8, 2009 at 8:1 Comment(2)
My generic suggestion to this would be - try it yourself and measure if it makes any difference... (Although I know it is not really what you are after...)Until
@Until I intend to test it, but I also think it's a valid question, and I didn't see a directly related question regarding text parsing, such as HTML or XML-based content. :)Lawley
B
2

I couldn't really say exactly whether this would help your performance or not. It would depend on how many strings you use, and how frequently you create instances of those strings. Interning is generally done automatically, so explicitly checking if the string is interned may actually increase your overhead and reduce your performance. When it comes to memory usage, interned strings can definitely use less memory.

If you do wish to use string interning, there are some better ways to achieve it. First and foremost, I would stick your element names in a static class full of public string constants. Any string literal found in your program source code is definitely and automatically interned. Such strings are loaded into the intern pool when your application is loaded. If your strings can not be defined as constants for compile-time intern preparation, then I would simply call String.Intern(...) rather than doing the full ternary expression String.IsInterned(...) ? ... : String.Intern(...). The Intern method will automatically check if the string is interned, return the interned version if it is, and will otherwise add the string to the intern pool and return that if it is not. No need to manually check IsInterned yourself.

Again, I can not say whether manually interning strings will improve performance. If you use constants, they will be automatically interned for you, in the most optimal way, and that is the best approach to improving performance and memory usage of regularly reused strings. I would honestly recommend you stay away from manual interning, and let the compiler and runtime handle optimization for you.

Bluecoat answered 31/8, 2009 at 8:21 Comment(0)
I
1

Of course, interning strings help performanance but as @jrista said "If you use constants, they will be automatically interned for you,...".

Here are some articles might help you,

Optimizing C# String Performance

SUMMARY: Sharing Memory, C# maintains something called an "intern table." This is a list of strings that are currently referenced. If a new string is created, then the intern table is checked. If your string is already in there, then both variables will point at the same block of memory maintained by the intern table.

http://blog.cumps.be/string-concatenation-vs-memory-allocation/

Inguinal answered 31/8, 2009 at 8:28 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.