Is there a way in C# to reduce allocation of new string in a large array?
Asked Answered
D

4

7

I am trying to understand memory allocation and how we can reduce them.

I created the following long list for testing

var list = new List<int>(Enumerable.Range(0, 1_000_000).ToArray());

Then I looped through it and printed a string like this

for (var i = 0; i < list.Count; i++) 
{
    Console.WriteLine("Item # " + list[i]); 
}

The previous loop generated over 2 million allocations. I believe that the line "Item # " + list[i] generated 2 million allocations.

** Question 1** Why would it allocate 2 million string in this case? Does list[i] have to be stored in the heap as string which is 1 allocation and then the combined string in another allocation, thus 2 allocations per loop?

Next step I thought about using a string builder to reduce allocations

var builder = new StringBuilder();
for (var i = 0; i < list.Count; i++)
{
    builder.Append("Item # ");
    builder.Append(list[i]);

    Console.WriteLine(builder.ToString());
 
    builder.Clear();
}

The above code has 1 million allocations, which is half of the previous allocation.

But, that still a lot of allocations.

I know string are immutable objects in C# and each string has it's own allocation in the heap. But, is there a way we reuse memory allocations so that we can create 1 string, then reuse the same allocation inside that loop over and over? In my case, once I print the string, I no longer need it. It would be safe for me to reuse the same allocation and just update it's value.

** Question 2** Is it possible to re-use memory allocations to reduce the amount of allocations?

** Question 3** What other tricks can I try to improve my loop?

Duong answered 24/12, 2023 at 18:8 Comment(7)
what about Console.Write("Item # "); Console.WriteLine(list[i]); instead of concatenation and thus allocation?Tentation
What are you trying to achieve? I don't think that composing million messages to console is really what you're after. So answering your question #3 in the provided context would tell you how to optimize 1M messages composed of constant prefix and a number written to console. Is this truly what you're after?Squall
Perhaps using some Spans could help...Yesman
@ZdeněkJelínek as mentioned in the OP, I am just trying to understand memory allocations and how to reduce them. So the objective of this example is to print out the message on the console. But looking for a way to reduce allocations while doing so.Duong
@Yesman how would I use Span to reduce allocations here?Duong
It's worth noting that although you are doing 2 million allocations, each can be quickly forgotten, allowing for Gen0 cleanup. But, I'm not sure if the JITter is smart enough to leave hints for the GC as to what is no longer referenced in a tight loop like you have. Unintuitively, you may get better perf by doing the allocations in a separate method. To test this hypothesis, set up a run and use Performance Monitor (PerfMon) and the .NET Memory Counters to see the allocations and the GCs over time. The moral of this is that when you have a perf hit, consider optimizing things selectivelyCharmain
By the way, one way to reduce allocations is to size your list when you initialize it (since you know it will need 1 million entries). Otherwise it will initially have 4 entries, then 8, then 16, then 32, etc. You'll need about 20 allocations to create that list (granted, small potatoes in this scenario, but it's still something to think about when filling stretchy containers backed by an array)Charmain
Y
9

Is it possible to re-use memory allocations to reduce the amount of allocations?

Yes. Put the prefix "Item #" into a separate variable, and make a Span<char> to store the string representation of list[i]. You can reuse that same Span each time you get a new number.

var prefix = "Item # ";
char[] arr = new char[6];
Span<char> number = new(arr);
for (int i = 0 ; i < list.Count ; i++) {
    Console.Write(prefix);
    list[i].TryFormat(number, out int charsWritten);
    Console.Out.WriteLine(number[0..charsWritten]);
}

Notes:

  • We make a Span from an char[]. Note that the array should be large enough to store "999999", the longest (in terms of string representation) number you want to print.
  • TryFormat writes into the span and tells us how many characters it wrote. We make use of this information so that we only WriteLine the chars it wrote, and not anymore.
  • TryFormat returns a bool indicating whether the integer is successfully written to the span. Consider checking it if that is something you are concerned about.
  • We have to get Console.Out in order to use the WriteLine overload that takes a span. Console itself doesn't have such a static method.
Yesman answered 24/12, 2023 at 19:0 Comment(9)
Is it worth checking the return value of TryFormat in case it failed?Murphy
@AndrewMorton In this case, it can only fail if the span is not large enough, as far as I know. And charsWritten can't be greater than the span's length anyway. It would be safe to just print out number[0..charsWritten] anyway, unless OP has some specific thing they want to do in the case of failure.Yesman
It looks to me like it would be a good idea to add a format specifier. For example, the OP wants to go up to 999999 which would often be formatted to the seven characters "999,999". Yes, I read the notes, but the first one isn't necessarily consistent with the char[6].Murphy
@AndrewMorton The default format doesn't produce "999,999" though. Sure, if OP wants a different format, then the length of the array will probably have to be adjusted.Yesman
Yeah, I suppose my suggestions lean towards due diligence being needed by anyone using the code in your answer. Let's hope :)Murphy
Due to string interning it makes no difference if prefix is in a variable or a constant literal.Disconformity
@Disconformity I know. I just wanted to make it extra clear what are allocated, so that anyone can understand at first sight, I suppose. My wording might have unintentionally suggested that it makes a difference.Yesman
@Yesman amazing optimization and explanation! Indeed that saved nearly 2 million allocations. By the way number[0..charsWritten] means read from the beginning of the array to the the index of charsWritten. With every new version, C# is becoming less sharp/readable ;). In my example, why are the allocations being directly written to the heap instead of the stack?Duong
@Duong That's just how reference types behave. In your code, you create a lot of new string objects, by string concatenation and Int32.ToString(), both of which I avoided. Note that the char[] in my answer is also allocated on the heap - the crucial difference is that we are reusing it.Yesman
J
5

You can completely remove allocations for the print loop by using trick similar to what StringBuilder does internally (it uses Span's and ISpanFormattable interface to reduce allocations) and using Console.Out stream:

// allocate big enough buffer to hold the largest formatted number:
char[] buffer = new char[8]; 
var span = buffer.AsSpan();

var allocatedBefore = GC.GetTotalAllocatedBytes(); // to check allocations

for (var i = 0; i < list.Count; i++) 
{
    // write prefix, due to interning it won't be allocated every time
    // in other cases can be moved to outside scope as variable/parameter
    Console.Write("Item # "); 

    // Note - returns bool, in general case might need to check
    // can be false if the span is not big enough
    list[i].TryFormat(span, out int written); 

    // write formatted line to stdout via bufffer
    Console.Out.WriteLine(buffer, 0, written); 
}

// allocations checks
var allocatedAfter = GC.GetTotalAllocatedBytes();
Console.WriteLine(allocatedAfter - allocatedBefore); 

Read more:

Johnette answered 24/12, 2023 at 19:23 Comment(0)
C
3

Question 1: Why 2 Million Allocations?

In C#, strings are immutable, meaning once created, they cannot be changed. Each concatenation operation creates a new string object. In your loop, "Item # " + list[i] creates a new string for each iteration. Here's what happens:

  1. list[i]: The integer is boxed into an object to be concatenated with the string (if not optimized by the JIT compiler). This might not always result in a separate heap allocation, but it's an operation to consider.
  2. Concatenation: "Item # " + list[i] results in a new string object. This is definitely a separate heap allocation.

Thus, you end up with approximately 2 million string allocations.

Question 2: Reusing Memory Allocations

The StringBuilder approach you used is a good start. It avoids creating a new string on each iteration for concatenation. However, StringBuilder.ToString() creates a new string each time it's called. Here's a more efficient approach:

  • Use a Single StringBuilder Instance: You're already doing this, which is good. But instead of converting it to a string on each iteration, consider using it differently.
  • Print Directly from StringBuilder: Unfortunately, Console.WriteLine doesn't have an overload for StringBuilder. So, this approach might not be directly applicable here, but it's useful in other contexts where you can avoid converting StringBuilder to a string.
  • Buffer Management: In more advanced scenarios, you could use buffer pooling (like ArrayPool<T>) to reuse memory allocations, but this is more complex and might be overkill for simple scenarios.

Question 3: Further Optimization Tips

  1. Avoid String Concatenation in Loops: As demonstrated, this leads to many allocations.
  2. Precomputed Strings: If the range of integers is known and not too large, precomputing the strings outside the loop and storing them in an array can be more efficient.
  3. Using Span<T> or Memory<T>: These types allow for more efficient memory operations without allocations, but they require a different approach to handling data.
  4. Profile Your Code: Use diagnostic tools to understand where allocations are happening and focus on optimizing those areas.
Citral answered 24/12, 2023 at 19:41 Comment(1)
The chars in the StringBuilder could be printed one at a time with Console.Write and thus, no conversion to string would be required.Cowry
R
1

** Question 1** Why would it allocate 2 million string in this case? Does list[i] have to be stored in the heap as string which is 1 allocation and then the combined string in another allocation, thus 2 allocations per loop?

Performing

"Item # " + list[i]

does two allocations: it creates a string from the list[i] integer and then another one by concatenating constant "Item #" with the intermediate received prior.

The StringBuilder-based version eliminates the conversion of list[i] into a string and writes it directly to its backing buffer.

** Question 2** Is it possible to re-use memory allocations to reduce the amount of allocations?

Yes, that's what StringBuilder does. However, as long as you want to get a string out, it will need to allocate. If you need a million strings, you get a million allocations.

You could reduce memory allocations further by allocating a single character array (as in char[]) and reusing it between the iterations. But then you have to also handle the actual length of its content (i.e. what string and StringBuilder do for you for the cost of allocating). And also account for the fact that not all APIs support char buffers instead of strings.

As mentioned by @sweeper, the Console.Write as suggested by comments and originally in this answer still allocates for writing int values. To work around this, you can use allocation-free formatting from ISpanFormattable implemented on Int32. If you're not on .NET 6+, you can hand-roll the implementation, although it will not be equivalent.

** Question 3** What other tricks can I try to improve my loop?

See above. It depends on what you're trying to do.

Radicand answered 24/12, 2023 at 18:18 Comment(5)
Console.Write still creates a string, even if you write an int. At least that's how I understand it from source.dot.netYesman
as long as you want to get a string out, it will need to allocate. If you need a million strings, you get a million allocations. I think this is the key. I do need a string. Is there a way to get a string and specify where to place it on the heap so we can reuse the original location where the string was placed in?Duong
@Yesman You are entirely correct, thank you for noticing, I have updated the answer to mention this and also suggest an alternative.Squall
@Duong yes, this can be done. I do believe this is not a great idea and should only be used as a last resort. There are typically other, less risky way of optimizing allocations. But if you really must do this, I suggest you seek out the internet resources to also find out about the risks and consequences of mutating strings.Squall
Basically the idea is that string is immutable. To achieve this, it has to own the memory used to store the characters. So it is not possible to just point a string to a place in memory and have it use it. The implementation will always make a defensive copy of the data passed in. What can be done is using reflection to overwrite string's content. This results in fragile and complicated code.Squall

© 2022 - 2024 — McMap. All rights reserved.