Boxing and Unboxing in String.Format(...) ... is the following rationalized?
Asked Answered
H

7

26

I was doing some reading regarding boxing/unboxing, and it turns out that if you do an ordinary String.Format() where you have a value type in your list of object[] arguments, it will cause a boxing operation. For instance, if you're trying to print out the value of an integer and do string.Format("My value is {0}",myVal), it will stick your myVal int in a box and run the ToString function on it.

Browsing around, I found this article.

It appears you can avoid the boxing penalty simply by doing the .ToString on the value type before handing it on to the string.Format function: string.Format("My value is {0}",myVal.ToString())

  1. Is this really true? I'm inclined to believe the author's evidence.
  2. If this is true, why doesn't the compiler simply do this for you? Maybe it's changed since 2006? Does anybody know? (I don't have the time/experience to do the whole IL analysis)
Heteromorphic answered 12/12, 2011 at 16:18 Comment(2)
Nitpick: With string.Format("My value is {0}, myVal) it's calling a form string.Format(string, object) rather than string.Format(string, object[]). This has no bearing on the question, but it's worth noting that it has no bearing on the question - it applies across several similar calls.Hankins
One point to consider is if you use formatting like {0:n}, then myVal shows 1,234.00 and myVal.ToString() shows 1234. It's not the compiler's job to know what the resulting format will be.Taproom
H
18

The compiler doesn't do this for you because string.Format takes a params Object[]. The boxing happens because of the conversion to Object.

I don't think the compiler tends to special case methods, so it won't remove boxing in cases like this.

Yes in many cases it is true that the compiler won't do boxing if you call ToString() first. If it uses the implementation from Object I think it would still have to box.

Ultimately the string.Format parsing of the format string itself is going to be much slower than any boxing operation, so the overhead is negligible.

Haroun answered 12/12, 2011 at 16:21 Comment(1)
The MSIL for the String.Format("test {0}", someInt) has a box operation in, whilst the MSIL for String.Format("test {0}", someInt.ToString()) doesn't. Empirically, I agree, the parsing of the format string is definitely the slow part.Gast
D
12

1: yes, as long as the value-type overrides ToString(), which all the inbuilt types do.

2: because no such behaviour is defined in the spec, and the correct handling of a params object[] (wrt value-types) is: boxing

string.Format is just like any other opaque method; the fact that it is going to do that is opaque to the compiler. It would also be functionally incorrect if the pattern included a format like {0:n2} (which requires a specific transformation, not just ToString()). Trying to understand the pattern is undesirable and unreliable since the pattern may not be known until runtime.

Democritus answered 12/12, 2011 at 16:22 Comment(1)
+1; Good point about alternate styles of format string, and not being able to deduce the format string until runtime. It also shouldn't call ToString in cases where that parameter didn't appear in the format string, because it shouldn't assume that it will appear and ToString could be expensive.Haroun
U
7

It would be better to avoid the boxing by constructing the string with StringBuilder or StringWriter and using the typed overloads.

Most of the time the boxing should be of little concern and not worth you even being aware of it.

Unction answered 12/12, 2011 at 16:55 Comment(2)
I'm impressed. The StringBuilder method showed a 65% reduction over any other method listed above.Taste
I shouldn't have used "above" in my comment. Drat. But anyhow, the StringWriter fared with the others in the 23s area on my box. So StringBuilder for the win!Taste
H
7

The easy one first. The reason that the compiler doesn't turn string.Format("{0}", myVal) into string.Format{"{0}", myVal.ToString()), is that there's no reason why it should. Should it turn BlahFooBlahBlah(myVal) into BlahFooBlahBlah(myVal.ToString())? Maybe that'll have the same effect but for better performance, but chances are it'll introduce a bug. Bad compiler! No biscuit!

Unless something can be reasoned about from general principles, the compiler should leave alone.

Now for the interesting bit IMO: Why does the former cause boxing and the latter not.

For the former, since the only matching signature is string.Format(string, object) the integer has to be turned into an object (boxed) to be passed to the method, which expects to receive a string and an object.

The other half of this though, is why doesn't myVal.ToString() box too?

When the compiler comes to this bit of code it has the following knowledge:

  1. myVal is an Int32.
  2. ToString() is defined by Int32
  3. Int32 is a value-type and therefore:
  4. myVal cannot possibly be a null reference* and:
  5. There cannot possibly be a more derived override - Int32.ToString() is effectively sealed.

Now, generally the C# compiler uses callvirt for all method calls for two reasons. The first is that sometimes you do want it to be a virtual call after all. The second is that (more controversially) they decided to ban any method call on a null reference, and callvirt has a built-in test for that.

In this case though, neither of those apply. There can't be a more derived class that overrides Int32.ToString(), and myVal cannot be null. It can therefore introduce a call to the ToString() method that passes the Int32 without boxing.

This combination (value can't be null, method can't be overriden elsewhere) only comes up with reference types much less often, so the compiler can't take as much advantage of it then (it also wouldn't cost as much, since they wouldn't have to be boxed).

This isn't the case if Int32 inherits a method implementaiton. For instance myVal.GetType() would box myVal as there is no Int32 override - there can't be, it's not virtual - so it can only be accessed by treating myVal as an object, by boxing it.

The fact that this means that the C# compiler will use callvirt for non-virtual methods and sometimes call for virtual methods, is not without a degree of irony.

*Note that even a nullable integer set to null is not the same as a null reference in this regard.

Hankins answered 12/12, 2011 at 18:2 Comment(0)
D
4

Why not try each approach a hundred million times or so and see how long it takes:

static void Main(string[] args)
{
    Stopwatch sw = new Stopwatch();

    int myVal = 6;

    sw.Start();

    for (int i = 0; i < 100000000; i++)
    {
        string string1 = string.Format("My value is {0}", myVal);
    }

    sw.Stop();

    Console.WriteLine("Original method - {0} milliseconds", sw.ElapsedMilliseconds);

    sw.Reset();

    sw.Start();

    for (int i = 0; i < 100000000; i++)
    {
        string string2 = string.Format("My value is {0}", myVal.ToString());
    }

    sw.Stop();

    Console.WriteLine("ToStringed method - {0} milliseconds", sw.ElapsedMilliseconds);

    Console.ReadLine();
}

On my machine I'm finding that the .ToStringed version is running in about 95% of the time that the original version takes, so some empirical evidence for a slight performance benefit.

Dentilabial answered 12/12, 2011 at 16:29 Comment(2)
I also added a test for myVal.ToString("My value is 0"); as well and the result between all three were negligible on my system: 24830ms, 22775ms and 22339ms, respectively. I'd say this is a definite micro-optimization that will make no difference in any real-world application.Taste
@Yan Nelson Empirically you right :) I think it's about be aware of boxing process that occur there. Another consideration will be how many time this code is getting invoke. If not too much frequently, there no need to optimize.Flood
B
1
string.Format("My value is {0}", myVal)<br>
myVal is an object<br><br>

string.Format("My value is {0}",myVal.ToString())<br>
myVal.ToString() is a string<br><br>

ToString is overloaded and therefore the compiler cannot decide for you.

Bowrah answered 12/12, 2011 at 16:24 Comment(0)
L
0

I've found a StringFormatter project on GitHub. Description sounds very promising:

The built-in string formatting facilities in .NET are robust and quite usable. Unfortunately, they also perform a ridiculous number of GC allocations. Mostly these are short lived, and on the desktop GC they generally aren't noticeable. On more constrained systems however, they can be painful. Additionally, if you're trying to track your GC usage via live reporting in your program, you might quickly notice that attempts to print out the current GC state cause additional allocations, defeating the entire attempt at instrumentation.

Thus the existence of this library. It's not completely allocation free; there are several one-time setup costs. The steady state though is entirely allocation-free. You can freely use the string formatting utilities in the main loop of a game without it causing a steady churn of garbage.

I've quickly checked the interface of library. Instead of params pack, author uses functions with manually defined generic arguments. Which completely makes sense for me, if you take care of garbage.

Loosejointed answered 10/6, 2019 at 11:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.