String.Join vs. StringBuilder: which is faster?
Asked Answered
U

5

98

In a previous question about formatting a double[][] to CSV format, it was suggested that using StringBuilder would be faster than String.Join. Is this true?

Ungenerous answered 25/2, 2009 at 12:49 Comment(2)
For readers clarity, it was about using a single StringBuilder, vs multiple string.Join, which were then joined (n+1 joins)Bolter
The difference in performance quickly runs up to several orders of magnitude. If you do more than a handful of joins, you can gain a lot of performance by switching to stringbuilderAirplane
A
139

Short answer: it depends.

Long answer: if you already have an array of strings to concatenate together (with a delimiter), String.Join is the fastest way of doing it.

String.Join can look through all of the strings to work out the exact length it needs, then go again and copy all the data. This means there will be no extra copying involved. The only downside is that it has to go through the strings twice, which means potentially blowing the memory cache more times than necessary.

If you don't have the strings as an array beforehand, it's probably faster to use StringBuilder - but there will be situations where it isn't. If using a StringBuilder means doing lots and lots of copies, then building an array and then calling String.Join may well be faster.

EDIT: This is in terms of a single call to String.Join vs a bunch of calls to StringBuilder.Append. In the original question, we had two different levels of String.Join calls, so each of the nested calls would have created an intermediate string. In other words, it's even more complex and harder to guess about. I would be surprised to see either way "win" significantly (in complexity terms) with typical data.

EDIT: When I'm at home, I'll write up a benchmark which is as painful as possibly for StringBuilder. Basically if you have an array where each element is about twice the size of the previous one, and you get it just right, you should be able to force a copy for every append (of elements, not of the delimiter, although that needs to be taken into account too). At that point it's nearly as bad as simple string concatenation - but String.Join will have no problems.

Ashjian answered 25/2, 2009 at 13:0 Comment(7)
Even when I don't have the strings beforehand, it seems faster to use String.Join. Please check my answer...Ungenerous
At will depend on how the array is produced, its size etc. I'm happy to give a fairly definitive "In <this> case String.Join is going to be at least as fast" - I wouldn't like to do the reverse.Ashjian
(In particular, look at Marc's answer, where StringBuilder beats out String.Join, just about. Life is complicated.)Ashjian
I think you've implied this, but my understanding is that you can improve StringBuilder's performance by setting its capacity high enough up front so that few or no extra copies are done. Of course, this assumes your code can know something about the sizes involved.Ogletree
But in case of a single call to String.Join vs single call to StringBuilder, then String.Join wins, do you agree? (I find it important to note because this page get hits thanks to its bombastic title regardless to the particular scenario mentioned in the OP's question)Fourhanded
@BornToCode: Do you mean constructing a StringBuilder with an original string, then calling Append once? Yes, I'd expect string.Join to win there.Ashjian
[Thread necromancy]: Current (.NET 4.5) implementation of string.Join uses StringBuilder.Underneath
B
36

Here's my test rig, using int[][] for simplicity; results first:

Join: 9420ms (chk: 210710000
OneBuilder: 9021ms (chk: 210710000

(update for double results:)

Join: 11635ms (chk: 210710000
OneBuilder: 11385ms (chk: 210710000

(update re 2048 * 64 * 150)

Join: 11620ms (chk: 206409600
OneBuilder: 11132ms (chk: 206409600

and with OptimizeForTesting enabled:

Join: 11180ms (chk: 206409600
OneBuilder: 10784ms (chk: 206409600

So faster, but not massively so; rig (run at console, in release mode, etc):

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace ConsoleApplication2
{
    class Program
    {
        static void Collect()
        {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
            GC.WaitForPendingFinalizers();
        }
        static void Main(string[] args)
        {
            const int ROWS = 500, COLS = 20, LOOPS = 2000;
            int[][] data = new int[ROWS][];
            Random rand = new Random(123456);
            for (int row = 0; row < ROWS; row++)
            {
                int[] cells = new int[COLS];
                for (int col = 0; col < COLS; col++)
                {
                    cells[col] = rand.Next();
                }
                data[row] = cells;
            }
            Collect();
            int chksum = 0;
            Stopwatch watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += Join(data).Length;
            }
            watch.Stop();
            Console.WriteLine("Join: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Collect();
            chksum = 0;
            watch = Stopwatch.StartNew();
            for (int i = 0; i < LOOPS; i++)
            {
                chksum += OneBuilder(data).Length;
            }
            watch.Stop();
            Console.WriteLine("OneBuilder: {0}ms (chk: {1}", watch.ElapsedMilliseconds, chksum);

            Console.WriteLine("done");
            Console.ReadLine();
        }
        public static string Join(int[][] array)
        {
            return String.Join(Environment.NewLine,
                    Array.ConvertAll(array,
                      row => String.Join(",",
                        Array.ConvertAll(row, x => x.ToString()))));
        }
        public static string OneBuilder(IEnumerable<int[]> source)
        {
            StringBuilder sb = new StringBuilder();
            bool firstRow = true;
            foreach (var row in source)
            {
                if (firstRow)
                {
                    firstRow = false;
                }
                else
                {
                    sb.AppendLine();
                }
                if (row.Length > 0)
                {
                    sb.Append(row[0]);
                    for (int i = 1; i < row.Length; i++)
                    {
                        sb.Append(',').Append(row[i]);
                    }
                }
            }
            return sb.ToString();
        }
    }
}
Bolter answered 25/2, 2009 at 13:5 Comment(6)
Thanks Marc. What do you get for larger arrays? I'm using [2048][64] for example (about 1 MB). Also do your results anyhow differ if you use the OptimizeForTesting() method I'm using?Ungenerous
Thanks a lot Marc. But I notice that this is not the first time we get different results for micro-benchmarks. Do you have any idea why this may be?Ungenerous
Karma? Cosmic rays? Who knows... it shows the dangers of micro-optimising, though ;-pBolter
Are you using an AMD processor for example? ET64? Maybe I have too little cache memory (512 KB)? Or maybe the .NET framework on Windows Vista is more optimized than that for XP SP3? What do you think? I'm really interested in why this is happening...Ungenerous
XP SP3, x86, Intel Core2 Duo T7250@2GHzBolter
Sorry, but any time someone mentions "micro-optimizing" and strings, you have to link to Atwood's classic, "The Sad Tragedy of Micro-Optimization Theater". So I did, a few years late. ;^)Neela
U
24

I don't think so. Looking through Reflector, the implementation of String.Join looks very optimized. It also has the added benefit of knowing the total size of the string to be created in advance, so it doesn't need any reallocation.

I have created two test methods to compare them:

public static string TestStringJoin(double[][] array)
{
    return String.Join(Environment.NewLine,
        Array.ConvertAll(array,
            row => String.Join(",",
                       Array.ConvertAll(row, x => x.ToString()))));
}

public static string TestStringBuilder(double[][] source)
{
    // based on Marc Gravell's code

    StringBuilder sb = new StringBuilder();
    foreach (var row in source)
    {
        if (row.Length > 0)
        {
            sb.Append(row[0]);
            for (int i = 1; i < row.Length; i++)
            {
                sb.Append(',').Append(row[i]);
            }
        }
    }
    return sb.ToString();
}

I ran each method 50 times, passing in an array of size [2048][64]. I did this for two arrays; one filled with zeros and another filled with random values. I got the following results on my machine (P4 3.0 GHz, single-core, no HT, running Release mode from CMD):

// with zeros:
TestStringJoin    took 00:00:02.2755280
TestStringBuilder took 00:00:02.3536041

// with random values:
TestStringJoin    took 00:00:05.6412147
TestStringBuilder took 00:00:05.8394650

Increasing the size of the array to [2048][512], while decreasing the number of iterations to 10 got me the following results:

// with zeros:
TestStringJoin    took 00:00:03.7146628
TestStringBuilder took 00:00:03.8886978

// with random values:
TestStringJoin    took 00:00:09.4991765
TestStringBuilder took 00:00:09.3033365

The results are repeatable (almost; with small fluctuations caused by different random values). Apparently String.Join is a little faster most of the time (although by a very small margin).

This is the code I used for testing:

const int Iterations = 50;
const int Rows = 2048;
const int Cols = 64; // 512

static void Main()
{
    OptimizeForTesting(); // set process priority to RealTime

    // test 1: zeros
    double[][] array = new double[Rows][];
    for (int i = 0; i < array.Length; ++i)
        array[i] = new double[Cols];

    CompareMethods(array);

    // test 2: random values
    Random random = new Random();
    double[] template = new double[Cols];
    for (int i = 0; i < template.Length; ++i)
        template[i] = random.NextDouble();

    for (int i = 0; i < array.Length; ++i)
        array[i] = template;

    CompareMethods(array);
}

static void CompareMethods(double[][] array)
{
    Stopwatch stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < Iterations; ++i)
        TestStringJoin(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringJoin    took " + stopwatch.Elapsed);

    stopwatch.Reset(); stopwatch.Start();
    for (int i = 0; i < Iterations; ++i)
        TestStringBuilder(array);
    stopwatch.Stop();
    Console.WriteLine("TestStringBuilder took " + stopwatch.Elapsed);

}

static void OptimizeForTesting()
{
    Thread.CurrentThread.Priority = ThreadPriority.Highest;
    Process currentProcess = Process.GetCurrentProcess();
    currentProcess.PriorityClass = ProcessPriorityClass.RealTime;
    if (Environment.ProcessorCount > 1) {
        // use last core only
        currentProcess.ProcessorAffinity
            = new IntPtr(1 << (Environment.ProcessorCount - 1));
    }
}
Ungenerous answered 25/2, 2009 at 13:3 Comment(0)
U
16

Unless the 1% difference turns into something significant in terms of the time the entire program takes to run, this looks like micro-optimization. I'd write the code that's the most readable/understandable and not worry about the 1% performance difference.

Unearthly answered 25/2, 2009 at 13:18 Comment(3)
I believe the String.Join is more understandable, but the post was more of a fun challenge. :) It's also useful (IMHO) to learn that using a few built-in methods can be better than doing it by hand, even when intuition might suggest otherwise. ...Ungenerous
... Normally, many people would have suggested using the StringBuilder. Even if String.Join proved to be 1% slower, many people wouldn't have thought about it, just because they think StringBuilder is faster.Ungenerous
I don't have any problem with the investigation, but now that you have an answer I'm not sure that performance is the overriding concern. Since I can think of any reason to construct a string in CSV except to write it out to a stream, I probably wouldn't construct the intermediate string at all.Unearthly
A
-2

yes. If you do more than a couple of joins, it will be a lot faster.

When you do a string.join, the runtime has to:

  1. Allocate memory for the resulting string
  2. copy the contents of the first string to the beginning of the output string
  3. copy the contents of the second string to the end of the output string.

If you do two joins, it has to copy the data twice, and so on.

StringBuilder allocates one buffer with space to spare, so data can be appended without having to copy the original string. As there is space left over in the buffer, the appended string can be written into the buffer directly. Then it just has to copy the entire string once, at the end.

Airplane answered 25/2, 2009 at 12:53 Comment(2)
But String.Join knows in advance how much to allocate, while StringBuilder doesn't. Please see my answer for more clarification.Ungenerous
@erikkallen: You can see the code for String.Join in Reflector. red-gate.com/products/reflector/index.htmUngenerous

© 2022 - 2024 — McMap. All rights reserved.