Fastest way to remove chars from string
Asked Answered
S

7

25

I have a string from which I have to remove following char: '\r', '\n', and '\t'. I have tried three different ways of removing these char and benchmarked them so I can get the fastest solution.

Following are the methods and there execution time when I ran them 1000000 times:

It should be fastest solution if I have 1 or 2 char to remove. But as I put in more char, it starts to take more time

str = str.Replace("\r", string.Empty).Replace("\n", string.Empty).Replace("\t", string.Empty);

Execution time = 1695

For 1 or 2 char, this was slower then String.Replace, but for 3 char it showed better performance.

string[] split = str.Split(new char[] { '\t', '\r', '\n' }, StringSplitOptions.None);
str = split.Aggregate<string>((str1, str2) => str1 + str2);

Execution time = 1030

The slowest of all, even with 1 char. Maybe my regular expression is not the best.

str = Regex.Replace(str, "[\r\n\t]", string.Empty, RegexOptions.Compiled);

Execution time = 3500

These are the three solutions I came up with. Is there any better and faster solution that anyone here know, or any improvements I can do in this code?

String that I used for benchmarking:

StringBuilder builder = new StringBuilder();
        builder.AppendFormat("{0}\r\n{1}\t\t\t\r\n{2}\t\r\n{3}\r\n{4}\t\t\r\n{5}\r\n{6}\r\n{7}\r\n{8}\r\n{9}",
         "SELECT ",
         "[Extent1].[CustomerID] AS [CustomerID], ",
         "[Extent1].[NameStyle] AS [NameStyle], ",
         "[Extent1].[Title] AS [Title], ",
           "[Extent1].[FirstName] AS [FirstName], ",
           "[Extent1].[MiddleName] AS [MiddleName], ",
           "[Extent1].[LastName] AS [LastName], ",
           "[Extent1].[Suffix] AS [Suffix], ",
           "[Extent1].[CompanyName] AS [CompanyName], ",
           "[Extent1].[SalesPerson] AS [SalesPerson], ");
        string str = builder.ToString();
Sommers answered 2/2, 2010 at 7:30 Comment(0)
H
22

Here's the uber-fast unsafe version, version 2.

    public static unsafe string StripTabsAndNewlines(string s)
    {
        int len = s.Length;
        char* newChars = stackalloc char[len];
        char* currentChar = newChars;

        for (int i = 0; i < len; ++i)
        {
            char c = s[i];
            switch (c)
            {
                case '\r':
                case '\n':
                case '\t':
                    continue;
                default:
                    *currentChar++ = c;
                    break;
            }
        }
        return new string(newChars, 0, (int)(currentChar - newChars));
    }

And here are the benchmarks (time to strip 1000000 strings in ms)

    cornerback84's String.Replace:         9433
    Andy West's String.Concat:             4756
    AviJ's char array:                     1374
    Matt Howells' char pointers:           1163
Hemipode answered 2/2, 2010 at 11:1 Comment(4)
Yes it is. Execution time = 195Sommers
btw, you need a new machine :PSommers
It's a recent Xeon - probably our benchmarks are just set up differently.Hemipode
I am surprised how long this has sat here without someone mentioning you can easily get a stack overflow exception when using this on a large string. I really like the feature of not allocating on the heap for something that could get called a lot (people looking for the fastest way are probably calling it a lot...), and if you consider the heap cleanup the performance difference is probably a bit greater than the benchmarks are showing. This needs to conditionally use the heap for large strings so that it can be both fast and reliable.Apotheosize
V
10

I believe you'll get the best possible performance by composing the new string as a char array and only convert it to a string when you're done, like so:

string s = "abc";
int len = s.Length;
char[] s2 = new char[len];
int i2 = 0;
for (int i = 0; i < len; i++)
{
    char c = s[i];
    if (c != '\r' && c != '\n' && c != '\t')
        s2[i2++] = c;
}
return new String(s2, 0, i2);

EDIT: using String(s2, 0, i2) instead of Trim(), per suggestion

Vitrify answered 2/2, 2010 at 7:55 Comment(2)
One correction, you have to do return new String(s2).TrimEnd('\0'); And the Execution time = 309. GreatSommers
Infact I made a little modification. You are already keeping the length of new array i.e. i2. So rather then trimming, you can use return new String(s2, 0, i2); That brings the execution time to 255Sommers
H
6
String.Join(null, str.Split(new char[] { '\t', '\r', '\n' },
    StringSplitOptions.None));

might give you a performance increase over using Aggregate() since Join() is designed for strings.

EDIT:

Actually, this might be even better:

String.Concat(str.Split(new char[] { '\t', '\r', '\n' },
    StringSplitOptions.None));
Hovel answered 2/2, 2010 at 7:45 Comment(2)
Nice! I updated my answer to use Concat() instead. Might be worth a try.Hovel
There was slight improvement when using String.Concat. Now, Execution time = 734Sommers
T
2

Looping through the string and using (just one) StringBuilder (with the proper constructor argument, to avoid unnecessary memory allocations) to create a new string could be faster.

Talmudist answered 2/2, 2010 at 7:36 Comment(0)
E
2

Even faster:

public static string RemoveMultipleWhiteSpaces(string s)
    {
        char[] sResultChars = new char[s.Length];

        bool isWhiteSpace = false;
        int sResultCharsIndex = 0;

        for (int i = 0; i < s.Length; i++)
        {
            if (s[i] == ' ')
            {
                if (!isWhiteSpace)
                {
                    sResultChars[sResultCharsIndex] = s[i];
                    sResultCharsIndex++;
                    isWhiteSpace = true;
                }
            }
            else
            {
                sResultChars[sResultCharsIndex] = s[i];
                sResultCharsIndex++;
                isWhiteSpace = false;
            }
        }

        return new string(sResultChars, 0, sResultCharsIndex);
    }
Earwitness answered 25/11, 2012 at 16:26 Comment(0)
M
1

try this

string str = "something \tis \nbetter than nothing";
string removeChars = new String(new Char[]{'\n', '\t'});
string newStr = new string(str.ToCharArray().Where(c => !removeChars.Contains(c)).ToArray());
Meteor answered 2/2, 2010 at 11:12 Comment(0)
A
0
string str;
str = str.Replace(Environment.NewLine, string.Empty).Replace("\t", string.Empty);
Alula answered 4/5, 2012 at 6:1 Comment(1)
This is no different than the SLOW version in the accepted answer. The OP is asking for the fastest.Meld

© 2022 - 2024 — McMap. All rights reserved.