Trim whitespace from the end of a StringBuilder without calling ToString().Trim() and back to a new SB
Asked Answered
D

7

36

What is an efficient way to trim whitespace from the end of a StringBuilder without calling ToString().Trim() and back to a new SB new StringBuilder(sb.ToString().Trim()).

Deathbed answered 15/7, 2014 at 23:17 Comment(19)
The problem with this one, is that it uses the very subjective word "fastest" in its title. It makes it sound like a competition.Pimento
i know that since i done it myself after a few answers on my own questions and some hours of testing but i didn't know doing it right away was "ok"Hen
@Bobo - 'Convert it to a string and .Trim()?' - is horribly unperformant. paqogomez - I will change to 'most performant' if that makes you feel better. See Bobo's answer, is typical (no offense), the kind of junk I found in related questions. Some of us C# devs actually care about things like not making a hundred or a thousand wasted allocs, I wish we could increase the number of devs who care about not being wasteful and slow needlessly.Deathbed
NicholasPetersen, the fact that @Bobo did answer so simply, is because best practices in software development is about making it work, first of all. Then, if there are any related issues, you know where it counts to optimize. Otherwise, it is overkill-optimization. I say this because you look very arrogant to me because of the way you speak on here. It is not because we, C# programmers, don't care about performance. It is because we concentrate on what's important depending on the customer's requirements. We shall get performance optimization once it works, and once the customer requires it.Reine
And if you care that much about performance, make it C/C++, or even Assembler directly with the registries, without using any interrupts. There you'll have raw performance.Reine
@WillMarcouiller I did not mean to sound arrogant. I do feel us C# devs who have a reasonable care for performance and for not being wasteful get beat up all the time in exactly situations like this, situations where we should have received ZERO flak, but there it is, here we are being beat up again. Note: 1) it took me about 5 minutes to write this function, 2) it was an extension method that could be used for the rest of one's days, 3) in some scenarios this would be THOUSANDS of times more performant, and yet you act like I put forth some ugly micro-optimization code.Deathbed
@NicholasPetersen Perhaps providing some metrics would move this out of the realm of subjectivity. I agree that your approach would be fast if you intend on keeping the StringBuilder. If you intend on discarding it and using the resultant string prior to needing to trim, then a TrimEnd() on that string will be faster. I'd be interested to see a case where your code is thousands of times faster than any other implementation.Cosimo
@RyanEmerle thank you for your response. Yes, the only circumstance that one would do this is when they were keeping the StringBuilder instance. Which was my case. Yes, thousands, easily, especially when you consider a trim might many times not even be needed. Even when needed, let's say your SB has 1000+ characters, it merely iterates let's say 2 back until finding a non-whitespace. Setting Length then does nothing in stringbuilder except internally setting the length integer. No allocs, etc.Deathbed
"is thousands of times faster than any other implementation." - I did not say with regard to any implementation, but with regard to the ToString().TrimEnd() then back to StringBuilder again method alone.Deathbed
Just to add a new consideration to this discussion. Look at the accepted answer in this question. I don't know if it applies also here but it seems a thing to be consideredBartizan
Thanks Steve, lots of good StringBuilder discussions there.Deathbed
Oh, come on! What is possibly 'opinion-based' in looking for the 'fastest' solution?? I'd be hard put finding a more objective objective!Heshvan
@NicholasPetersen: I now better get your point. I partially agree with you on the basis that many .NET programmers don't strive for performance for the reasons I mentioned above. Also is it easier not to care that much about such micro-optimization because computers hardware has improved a lot during the last 10 years. Historically, computer programmers had to take care of every single bit they needed because of the lack of resources. Then, the programmers got stuck becuase of the limits of the machines. Then, it was time for big hardware improvements, that is what happened.Reine
Now, younger programmers didn't know this era where computers were lacking of resources. Before you really slow down a good hardware these days, and really notice the difference on the user point of view, you have to really be neglecting memory usage and not being using some using blocks, etc. Depending on your needs, it might be relevant to optimize such behaviour. I agree with @RyanEmerle. It would be interesting to see the code of two different unit tests which demonstrate the delta between the time required for both tries. other community users could then try it at home! ;)Reine
@WillMarcouiller thanks for the nice comments and for the historical perspective.Deathbed
@NicholasPetersen you didn't specify at first that you wanted a SB forever. My comment is still valid for people who want to get a string as their end result.Hodgkin
@Hodgkin okay, but isn't that a misreading on your part? Where did I ever say anything about converting it to a string? Also, even though this isn't the use case I had in mind, even when converting it to a string, it is still twice the waste to convert sb.ToString().TrimEnd(). Why? Because that ultimately creates two separate strings (because string functions like Trim return a new instance), whereas if you did sb.TrimEnd().ToString(), you took care of the very minor trim op (often just a few characters) at the SB stage.Deathbed
Sure, I guess it was a misreading, but that is why having a clear question (like it is now after you edited it) is better. And I actually do like your extension, it seems very useful in these cases. But you didn't have to instantly turn rude and assume we are all idiots if we don't agree with you right away. Especially since for most people, the amount of "waste" without using your extension is minimal and optimizing it could be considered overkill.Hodgkin
@Hodgkin Sounds good Bobo, I apologize for getting rude.Deathbed
D
55

The following is an extension method, so you can call it like this:

sb.TrimEnd();

Also, it returns the SB instance, allowing you to chain other calls (sb.TrimEnd().AppendLine()).

public static StringBuilder TrimEnd(this StringBuilder sb)
{
    if (sb == null || sb.Length == 0) return sb;

    int i = sb.Length - 1;

    for (; i >= 0; i--)
        if (!char.IsWhiteSpace(sb[i]))
            break;

    if (i < sb.Length - 1)
        sb.Length = i + 1;

    return sb;
}

Notes:

  1. If Null or Empty, returns.

  2. If no Trim is actually needed, we're talking a very quick return time, with probably the most expensive call being the single call to char.IsWhiteSpace. So practically zero expense to call TrimEnd when not needed, as opposed to these ToString().Trim() back to SB routes.

  3. Else, the most expensive thing, if trim is needed, is the multiple calls to char.IsWhiteSpace (breaks on first non-whitespace char). Of course, the loop iterates backwards; if all are whitespace you'll end up with a SB.Length of 0.

  4. If whitespaces were encountered, the i index is kept outside the loop which allows us to cut the Length appropriately with it. In StringBuilder, this is incredibly performant, it simply sets an internal length integer (the internal char[] is kept the same internal length).

Update: See excellent notes by Ryan Emerle as follows, which correct some of my misunderstandings (the internal workings of SB are a little more complicated than I made it out to be):

The StringBuilder is technically a linked list of blocks of char[] so we don't end up in the LOH. Adjusting the length isn't technically as simple as changing the end index because if you move into a different chunk the Capacity must be maintained, so a new chunk may need to be allocated. Nevertheless, you only set the Length property at the end, so this seems like a great solution. Relevant details from Eric Lippert: https://mcmap.net/q/427772/-how-does-stringbuilder-work-internally-in-c

Also, see this nice article discussing the .NET 4.0 new StringBuilder implementation: http://1024strongoxen.blogspot.com/2010/02/net-40-stringbuilder-implementation.html

Update: Following illustrates what happens when a StringBuilder Length is altered (the only real operation done to the SB here, and that only when needed):

StringBuilder sb = new StringBuilder("cool  \t \r\n ");

sb.Capacity.Print(); // 16
sb.Length.Print();  // 11
        
sb.TrimEnd();

sb.Capacity.Print(); // 16
sb.Length.Print();  // 4 

You can see the internal array (m_ChunkChars) stays the same size after changing the Length, and in fact, you can see in the debugger it doesn't even overwrite the (in this case whitespace) characters. They are orphaned is all.

Deathbed answered 15/7, 2014 at 23:17 Comment(13)
Would you consider explaining that code, and why it meets the requirements of the question? That way it can help future readers learn.Maite
Sure, will put in the body.Deathbed
Cool. Just for clarity; does this method avoid rebuilding the output string each character? The indexer just hits the internal buffer, right?Maite
Right, the indexer accesses the internal char array (see sb.Capacity for it's size); StringBuilder is really just a glorified char[] with a Length field, which acts as a pointer to where to add to the internal array. Importantly, the only operation this method does to the SB is alter the Length field if needed, but this does not make the internal char[] cut in size (it only grows). If so, that would require a new array alloc & copy, which would defeat the purpose. Thus: a full Trim wouldn't make sense (and it would be so rarely needed anyways), bec that requires altering the internal array.Deathbed
Excellent, and good edit. Might add a relevant question about in the question itself, too. That'll be something on many people's mindsMaite
The StringBuilder is technically a linked list of blocks of char[] so we don't end up in the LOH. Adjusting the length isn't technically as simple as changing the end index because if you move into a different chunk the Capacity must be maintained, so a new chunk may need to be allocated. Nevertheless, you only set the Length property at the end, so this seems like a great solution.Cosimo
@RyanEmerle Adjusting the Length isn't a huge deal because conceptually there's no real way of avoiding actually removing those characters no matter what you do. The somewhat more concerning issue is the indexer, which is used an unknown number of times and that isn't as easy as just getting the item from the array, as it needs to find the correct chunk to index, making it somewhat more work.Illimitable
If part of @RyanEmerle 's concern was that the length was only being set at the end, does that raise concern if you want to do stuff like append to the SB afterwards? Does this depend somewhat on the framework version?Directions
I'm having a little bit of trouble following the part about moving into another chunk. I understand that it probably involves appending into the SB more characters than will fit into the pre-existing arrays, either the orphaned ones or the last one actually being used, but does that mean it won't simply overwrite the orphaned ones to save space and time?Directions
We had problems serializing a 270 MB JSON string. After switching to this method,the time needed, for a Release build, went from 22 minutes to 12 seconds.Aerometry
I really like this answer! I expanded on the idea to create a TrimStart()-variant as well, and encapsulated it into a StringBuilderExtensions partial class. Available as a freely available gist here: gist.github.com/ST-Emanuel/a079845848369e1f78eb2931f39e831cEmu
Thanks @EmanuelStrömgren! I would recommend against a TrimStart because I don’t believe sb.Remove is performant, though I could stand to be corrected. It seems a better way is just to wait till the sb has to be serialized to a string and to trim it at that time, ie when sb.ToString is called, as it allows a start index to be passed in. I wrote an extension method for this here: github.com/copernicus365/DotNetXtensions/blob/master/…Deathbed
That is definitely true, I must argue though that the use of TrimStart is at times handy to have while still modifying the string.Emu
B
4

You can try this:

StringBuilder b = new StringBuilder();
b.Append("some words");
b.Append(" to test   ");

int count = 0;
for (int i = b.Length - 1; i >= 0; i--)
{
    if (b[i] == ' ')
        count++;
    else
        break;
}

b.Remove(b.Length - count, count);
string result = b.ToString();

It will just iterate through the end while there are whitespaces then breaking out of the loop.

Or even like this:

StringBuilder b = new StringBuilder();
b.Append("some words");
b.Append(" to test   ");

do
{
    if(char.IsWhiteSpace(b[b.Length - 1]))
    {
         b.Remove(b.Length - 1,1);
    }
}
while(char.IsWhiteSpace(b[b.Length - 1]));

string get = b.ToString();
Beauvoir answered 16/7, 2014 at 0:14 Comment(0)
A
1
public static class StringBuilderExtensions
{
    public static StringBuilder Trim(this StringBuilder builder)
    {
        if (builder.Length == 0)
            return builder;

        var count = 0;
        for (var i = 0; i < builder.Length; i++)
        {
            if (!char.IsWhiteSpace(builder[i]))
                break;
            count++;
        }

        if (count > 0)
        {
            builder.Remove(0, count);
            count = 0;
        }

        for (var i = builder.Length - 1; i >= 0; i--)
        {
            if (!char.IsWhiteSpace(builder[i]))
                break;
            count++;
        }

        if (count > 0)
            builder.Remove(builder.Length - count, count);

        return builder;
    }
}
Aye answered 18/4, 2019 at 19:22 Comment(3)
Nice idea, the problem is, it's not performant I believe trimming from the beginning. It seems like a better idea therefore is to have a final trim operation when getting the string. So imagine if your method returned a string let's say called TrimToString, if the beginning needs trimmed you use the ToString overload to set the beginning index to start getting the string from (and trim the end first the normal way). I've been using this for a while, see new post with it in a min.Deathbed
Removing whitespace from end - is more perfomance, if i will set Length (like in your example). But in the start you convert StringBuilder to string, but i want return StringBuilder, it's a reason why i use Remove. If you want return string, you can make your method more perfomance - remember valid start index, remember valid end index (not set Length and don't call remove), in the ending of method call ToString(startValidIndex, Length - validStartIndex - validEndIndex)Aye
I took this for TrimStart and Petersen's for TrimEnd - i think this is best one can do reallyEuromarket
D
1

To do a full trim, it's not performant / advisable to do that on the StringBuilder level, but rather at ToString time, like with this TrimToString implementation:

    public static string TrimToString(this StringBuilder sb)
    {
        if (sb == null) return null;

        sb.TrimEnd(); // handles nulle and is very inexpensive, unlike trimstart

        if (sb.Length > 0 && char.IsWhiteSpace(sb[0])) {
            for (int i = 0; i < sb.Length; i++)
                if (!char.IsWhiteSpace(sb[i]))
                    return sb.ToString(i);
            return ""; // shouldn't reach here, bec TrimEnd should have caught full whitespace strings, but ...
        }

        return sb.ToString();
    }
Deathbed answered 18/4, 2019 at 19:57 Comment(4)
One moment. On .net core 2.1 I should add second argument in ToString: return sb.ToString(i, sb.Length - i)Unpile
@СергейРыбаков without looking at that code sample, nonetheless, your code requires pre-knowledge if it needed trimmed, and not only that, knowledge of exactly how many starting chars are white space. The point of the extension is for it to handle all of that.Deathbed
I'm sorry. I meant that in .net core 2.1 there is no ToString with one argument. Only 2 arguments. My example from first comment allow use your good TrimToString on .net core 2.1Unpile
On .NET 4.7.2 there's no overload too.Stabile
M
1

I extended Nicholas Petersen version for optional additional chars:

/// <summary>
/// Trims the end of the StingBuilder Content. On Default only the white space char is truncated.
/// </summary>
/// <param name="pTrimChars">Array of additional chars to be truncated.</param>
/// <returns></returns>
public static StringBuilder TrimEnd(this StringBuilder pStringBuilder, char[] pTrimChars = null)
{
    if (pStringBuilder == null || pStringBuilder.Length == 0)
        return pStringBuilder;

    int i = pStringBuilder.Length - 1;

    var lTrimChars = new HashSet<char>();
    if (pTrimChars != null)
        lTrimChars = pTrimChars.ToHashSet();

    for (; i >= 0; i--)
    {
        var lChar = pStringBuilder[i];
        if ((char.IsWhiteSpace(lChar) == false) && (lTrimChars.Contains(lChar) == false))
            break;
    }

    if (i < pStringBuilder.Length - 1)
        pStringBuilder.Length = i + 1;

    return pStringBuilder;
}

Edit: After Nicholas Petersen suggestion:

/// <summary>
/// Trims the end of the StingBuilder Content. On Default only the white space char is truncated.
/// </summary>
/// <param name="pTrimChars">Array of additional chars to be truncated. A little bit more efficient than using char[]</param>
/// <returns></returns>
public static StringBuilder TrimEnd(this StringBuilder pStringBuilder, HashSet<char> pTrimChars = null)
{
    if (pStringBuilder == null || pStringBuilder.Length == 0)
        return pStringBuilder;

    int i = pStringBuilder.Length - 1;

    for (; i >= 0; i--)
    {
        var lChar = pStringBuilder[i];

        if (pTrimChars == null)
        {
            if (char.IsWhiteSpace(lChar) == false)
                break;
        }
        else if ((char.IsWhiteSpace(lChar) == false) && (pTrimChars.Contains(lChar) == false))
            break;
    }

    if (i < pStringBuilder.Length - 1)
        pStringBuilder.Length = i + 1;

    return pStringBuilder;
}
Moonstruck answered 4/11, 2022 at 16:15 Comment(1)
I would recommend sending in the HashSet, instead of allocating and initing it on every callDeathbed
W
0

If you know how many whitespaces you want to remove, could try using StringBuilder.Remove(int startIndex, int length), doesn't need create an extension method.

Hope it will help!

Womanizer answered 19/1, 2023 at 20:2 Comment(0)
H
-1
StringBuilder myString = new StringBuilder("This is Trim test ");

if (myString[myString.Length - 1].ToString() == " ")
{              
    myString = myString.Remove(myString.Length - 1, 1);
}
Hegyera answered 16/8, 2018 at 5:41 Comment(1)
1) This does not trim multiple trailing whitespaces, 2) the only whitespace it checks for is a space, 3) no need to turn the char in line 1 into a string, just compare it as a char, if you were going that route (` == ' '`), 4) I would have to check how Remove works when at the end of the SB, but certainly it's not going to be faster than just changing the Length, as others have suggested below.Deathbed

© 2022 - 2024 — McMap. All rights reserved.