What is an efficient way to trim whitespace from the end of a StringBuilder
without calling ToString().Trim() and back to a new SB new StringBuilder(sb.ToString().Trim())
.
The following is an extension method, so you can call it like this:
sb.TrimEnd();
Also, it returns the SB instance, allowing you to chain other calls (sb.TrimEnd().AppendLine()
).
public static StringBuilder TrimEnd(this StringBuilder sb)
{
if (sb == null || sb.Length == 0) return sb;
int i = sb.Length - 1;
for (; i >= 0; i--)
if (!char.IsWhiteSpace(sb[i]))
break;
if (i < sb.Length - 1)
sb.Length = i + 1;
return sb;
}
Notes:
If Null or Empty, returns.
If no Trim is actually needed, we're talking a very quick return time, with probably the most expensive call being the single call to
char.IsWhiteSpace
. So practically zero expense to callTrimEnd
when not needed, as opposed to theseToString().Trim()
back to SB routes.Else, the most expensive thing, if trim is needed, is the multiple calls to
char.IsWhiteSpace
(breaks on first non-whitespace char). Of course, the loop iterates backwards; if all are whitespace you'll end up with aSB.Length
of 0.If whitespaces were encountered, the
i
index is kept outside the loop which allows us to cut the Length appropriately with it. InStringBuilder
, this is incredibly performant, it simply sets an internal length integer (the internalchar[]
is kept the same internal length).
Update: See excellent notes by Ryan Emerle as follows, which correct some of my misunderstandings (the internal workings of SB are a little more complicated than I made it out to be):
The
StringBuilder
is technically a linked list of blocks ofchar[]
so we don't end up in the LOH. Adjusting the length isn't technically as simple as changing the end index because if you move into a different chunk the Capacity must be maintained, so a new chunk may need to be allocated. Nevertheless, you only set the Length property at the end, so this seems like a great solution. Relevant details from Eric Lippert: https://mcmap.net/q/427772/-how-does-stringbuilder-work-internally-in-c
Also, see this nice article discussing the .NET 4.0 new StringBuilder
implementation: http://1024strongoxen.blogspot.com/2010/02/net-40-stringbuilder-implementation.html
Update: Following illustrates what happens when a StringBuilder
Length is altered (the only real operation done to the SB here, and that only when needed):
StringBuilder sb = new StringBuilder("cool \t \r\n ");
sb.Capacity.Print(); // 16
sb.Length.Print(); // 11
sb.TrimEnd();
sb.Capacity.Print(); // 16
sb.Length.Print(); // 4
You can see the internal array (m_ChunkChars
) stays the same size after changing the Length, and in fact, you can see in the debugger it doesn't even overwrite the (in this case whitespace) characters. They are orphaned is all.
StringBuilder
is technically a linked list of blocks of char[]
so we don't end up in the LOH. Adjusting the length isn't technically as simple as changing the end index because if you move into a different chunk the Capacity
must be maintained, so a new chunk may need to be allocated. Nevertheless, you only set the Length
property at the end, so this seems like a great solution. –
Cosimo Length
isn't a huge deal because conceptually there's no real way of avoiding actually removing those characters no matter what you do. The somewhat more concerning issue is the indexer, which is used an unknown number of times and that isn't as easy as just getting the item from the array, as it needs to find the correct chunk to index, making it somewhat more work. –
Illimitable sb.Remove
is performant, though I could stand to be corrected. It seems a better way is just to wait till the sb has to be serialized to a string and to trim it at that time, ie when sb.ToString is called, as it allows a start index to be passed in. I wrote an extension method for this here: github.com/copernicus365/DotNetXtensions/blob/master/… –
Deathbed You can try this:
StringBuilder b = new StringBuilder();
b.Append("some words");
b.Append(" to test ");
int count = 0;
for (int i = b.Length - 1; i >= 0; i--)
{
if (b[i] == ' ')
count++;
else
break;
}
b.Remove(b.Length - count, count);
string result = b.ToString();
It will just iterate through the end while there are whitespaces then breaking out of the loop.
Or even like this:
StringBuilder b = new StringBuilder();
b.Append("some words");
b.Append(" to test ");
do
{
if(char.IsWhiteSpace(b[b.Length - 1]))
{
b.Remove(b.Length - 1,1);
}
}
while(char.IsWhiteSpace(b[b.Length - 1]));
string get = b.ToString();
public static class StringBuilderExtensions
{
public static StringBuilder Trim(this StringBuilder builder)
{
if (builder.Length == 0)
return builder;
var count = 0;
for (var i = 0; i < builder.Length; i++)
{
if (!char.IsWhiteSpace(builder[i]))
break;
count++;
}
if (count > 0)
{
builder.Remove(0, count);
count = 0;
}
for (var i = builder.Length - 1; i >= 0; i--)
{
if (!char.IsWhiteSpace(builder[i]))
break;
count++;
}
if (count > 0)
builder.Remove(builder.Length - count, count);
return builder;
}
}
TrimToString
, if the beginning needs trimmed you use the ToString overload to set the beginning index to start getting the string from (and trim the end first the normal way). I've been using this for a while, see new post with it in a min. –
Deathbed To do a full trim, it's not performant / advisable to do that on the StringBuilder
level, but rather at ToString
time, like with this TrimToString
implementation:
public static string TrimToString(this StringBuilder sb)
{
if (sb == null) return null;
sb.TrimEnd(); // handles nulle and is very inexpensive, unlike trimstart
if (sb.Length > 0 && char.IsWhiteSpace(sb[0])) {
for (int i = 0; i < sb.Length; i++)
if (!char.IsWhiteSpace(sb[i]))
return sb.ToString(i);
return ""; // shouldn't reach here, bec TrimEnd should have caught full whitespace strings, but ...
}
return sb.ToString();
}
ToString
: return sb.ToString(i, sb.Length - i)
–
Unpile ToString
with one argument. Only 2 arguments. My example from first comment allow use your good TrimToString
on .net core 2.1 –
Unpile I extended Nicholas Petersen version for optional additional chars:
/// <summary>
/// Trims the end of the StingBuilder Content. On Default only the white space char is truncated.
/// </summary>
/// <param name="pTrimChars">Array of additional chars to be truncated.</param>
/// <returns></returns>
public static StringBuilder TrimEnd(this StringBuilder pStringBuilder, char[] pTrimChars = null)
{
if (pStringBuilder == null || pStringBuilder.Length == 0)
return pStringBuilder;
int i = pStringBuilder.Length - 1;
var lTrimChars = new HashSet<char>();
if (pTrimChars != null)
lTrimChars = pTrimChars.ToHashSet();
for (; i >= 0; i--)
{
var lChar = pStringBuilder[i];
if ((char.IsWhiteSpace(lChar) == false) && (lTrimChars.Contains(lChar) == false))
break;
}
if (i < pStringBuilder.Length - 1)
pStringBuilder.Length = i + 1;
return pStringBuilder;
}
Edit: After Nicholas Petersen suggestion:
/// <summary>
/// Trims the end of the StingBuilder Content. On Default only the white space char is truncated.
/// </summary>
/// <param name="pTrimChars">Array of additional chars to be truncated. A little bit more efficient than using char[]</param>
/// <returns></returns>
public static StringBuilder TrimEnd(this StringBuilder pStringBuilder, HashSet<char> pTrimChars = null)
{
if (pStringBuilder == null || pStringBuilder.Length == 0)
return pStringBuilder;
int i = pStringBuilder.Length - 1;
for (; i >= 0; i--)
{
var lChar = pStringBuilder[i];
if (pTrimChars == null)
{
if (char.IsWhiteSpace(lChar) == false)
break;
}
else if ((char.IsWhiteSpace(lChar) == false) && (pTrimChars.Contains(lChar) == false))
break;
}
if (i < pStringBuilder.Length - 1)
pStringBuilder.Length = i + 1;
return pStringBuilder;
}
If you know how many whitespaces you want to remove, could try using StringBuilder.Remove(int startIndex, int length), doesn't need create an extension method.
Hope it will help!
StringBuilder myString = new StringBuilder("This is Trim test ");
if (myString[myString.Length - 1].ToString() == " ")
{
myString = myString.Remove(myString.Length - 1, 1);
}
© 2022 - 2024 — McMap. All rights reserved.
StringBuilder
. If you intend on discarding it and using the resultantstring
prior to needing to trim, then aTrimEnd()
on that string will be faster. I'd be interested to see a case where your code is thousands of times faster than any other implementation. – Cosimousing
blocks, etc. Depending on your needs, it might be relevant to optimize such behaviour. I agree with @RyanEmerle. It would be interesting to see the code of two different unit tests which demonstrate the delta between the time required for both tries. other community users could then try it at home! ;) – Reine