Efficient way to remove ALL whitespace from String?
Asked Answered
S

19

504

I'm calling a REST API and am receiving an XML response back. It returns a list of a workspace names, and I'm writing a quick IsExistingWorkspace() method. Since all workspaces consist of contiguous characters with no whitespace, I'm assuming the easiest way to find out if a particular workspace is in the list is to remove all whitespace (including newlines) and doing this (XML is the string received from the web request):

XML.Contains("<name>" + workspaceName + "</name>");

I know it's case-sensitive, and I'm relying on that. I just need a way to remove all whitespace in a string efficiently. I know RegEx and LINQ can do it, but I'm open to other ideas. I am mostly just concerned about speed.

Screwed answered 2/6, 2011 at 19:35 Comment(3)
Parsing XML with regex is almost as bad as parsing HTML with regex.Nicknack
@henk holterman; See my answer below, regexp doesn't seem to be the fastest in all cases.Hawkes
Regex doesn't seem to be the fastest at all. I have summarized the results from many different ways to remove whitespace from a string. The summary is in an answer below - https://mcmap.net/q/73956/-efficient-way-to-remove-all-whitespace-from-stringUnquestionable
M
827

This is fastest way I know of, even though you said you didn't want to use regular expressions:

Regex.Replace(XML, @"\s+", "");

Crediting @hypehuman in the comments, if you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think.

private static readonly Regex sWhitespace = new Regex(@"\s+");
public static string ReplaceWhitespace(string input, string replacement) 
{
    return sWhitespace.Replace(input, replacement);
}
Monjan answered 2/6, 2011 at 19:38 Comment(9)
I could use a regular expression, I'm just not sure if it's the fastest way.Screwed
I'm pretty sure it is. At the very least behind the scenes you have to check every character, and this is just doing a linear search.Monjan
There isn't a faster way, the only "other" way is to do @"string".Replace(" ", string.Empty) for a million different combinations. Regex will do it all with just that.Theran
Side comment: in general Regex is no faster just neater - it decomposes your expression and also linearly does what you asked it too.Calefactory
If you plan to do this more than once, create and store a Regex instance. This will save the overhead of constructing it every time, which is more expensive than you might think. private static readonly Regex sWhitespace = new Regex(@"\s+"); public static string ReplaceWhitespace(string input, string replacement) { return sWhitespace.Replace(input, replacement); }Fishing
use split/join combination as tested to be the fastest so far, see KernowCode answer below.Floppy
Regex.Replace(XML, @"\p{Zs}", string.Empty) to clean too whitespaces that are not code 32.Wester
For those new to RegEx and looking for an explanation as to what this expression means, \s means "match any whitespace token", and + means "match one or more of the proceeding token". Also RegExr is a nice website to practice writing RegEx expressions with, if you want to experiment.Offend
This method could default to removing the whitespace if the signature defaults the replacement value. public static string ReplaceWhitespace(string input, string replacement = "")Unstep
H
258

I have an alternative way without regexp, and it seems to perform pretty good. It is a continuation on Brandon Moretz answer:

 public static string RemoveWhitespace(this string input)
 {
    return new string(input.ToCharArray()
        .Where(c => !Char.IsWhiteSpace(c))
        .ToArray());
 }

I tested it in a simple unit test:

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace1(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = input.RemoveWhitespace();
    }
    Assert.AreEqual(expected, s);
}

[Test]
[TestCase("123 123 1adc \n 222", "1231231adc222")]
public void RemoveWhiteSpace2(string input, string expected)
{
    string s = null;
    for (int i = 0; i < 1000000; i++)
    {
        s = Regex.Replace(input, @"\s+", "");
    }
    Assert.AreEqual(expected, s);
}

For 1,000,000 attempts the first option (without regexp) runs in less than a second (700 ms on my machine), and the second takes 3.5 seconds.

Hawkes answered 29/1, 2013 at 19:58 Comment(8)
.ToCharArray() is not necessary; you can use .Where() directly on a string.Pierro
Just to note here. Regex is slower... on small strings! If you say you had a digitized version of a Volume on US Tax Law (~million words?), with a handful of iterations, Regex is king, by far! Its not what is faster, but what should be used in which circumstance. You only proved half the equation here. -1 until you prove the second half of the test so that the answer provides more insight to when what should be used.Stylopodium
@ppumkin He asked for a single pass removal of whitespace. Not multiple iterations of other processing. I'm not going to make this single pass whitespace removal into an extended post about benchmarking text processing.Hawkes
You said its preferred not to use to regex this time but didn't say why.Stylopodium
@ProgramFOX, in a different question (can't readily find it) I noticed that at least in some queries, using ToCharArray is faster than using .Where() directly on the string. This has something to do with the overhead into the IEnumerable<> in each iteration step, and the ToCharArray being very efficient (block-copy) and the compiler optimizes iteration over arrays. Why this difference exists, no-one has been able to explain me, but measure before you remove ToCharArray().Reply
@Reply Oh, that's interesting. Thanks for the comment!Pierro
For other newbies like me: Get rid of "char[] does not contain a definition for 'Where'" : "using System.Linq;"Copyholder
Using .Where() without .ToCharArray() working slower on my machine. Be careful with thatKrummhorn
P
127

Try the replace method of the string in C#.

XML.Replace(" ", string.Empty);
Partida answered 2/6, 2011 at 19:43 Comment(4)
Doesn't remove tabs or newlines. If I do multiple removes now I'm making multiple passes over the string.Screwed
@MattSach why does it not remove ALL whitespace?Meredi
@Meredi It's only replacing space characters. The OP asked for replacement of newlines as well (which are "whitespace" characters, even though they're not a space character).Suspend
Regex.Replace(XML, @"\s+", string.Empty) removes all whitespaces.Harvell
F
111

My solution is to use Split and Join and it is surprisingly fast, in fact the fastest of the top answers here.

str = string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));

Timings for 10,000 loop on simple string with whitespace inc new lines and tabs

  • split/join = 60 milliseconds
  • linq chararray = 94 milliseconds
  • regex = 437 milliseconds

Improve this by wrapping it up in method to give it meaning, and also make it an extension method while we are at it ...

public static string RemoveWhitespace(this string str) {
    return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
}

Note this relies on behavior of Split described in the remarks:

You can use either a character array or a string array to specify zero or more delimiting characters or strings. If no delimiting characters are specified, the string is split at white-space characters.

Floppy answered 9/6, 2015 at 13:2 Comment(11)
I really like this solution, I've been using a similar one since pre-LINQ days. I'm actually impressed with LINQs performance, and somewhat surprised with regex. Maybe the code was not as optimal as it could have been for regex (you'll have to cache the regex object for example). But the crux of the problem is that the "quality" of the data will matter a lot. Maybe with long strings the regex will outperform the other options. It will be a fun benchmark to perform... :-)Ganger
How does default(string[]) == a list of all whitespace characters? I see it working, but I am not understanding how?Source
Split needs a valid array and null will not do so default(type) where in this case is a string[] returns the correct default for the function.Floppy
Actually default(string[]) returns null. This is just a special mode of this String.Split(...) overload that splits by whitespaces if you pass null in.Butanol
but there is a different between just 'null' and the default. It will not work with null, hence the use of default.Floppy
@kernowcode You mean the ambiguity between the the 2 overloads with string[] and char[]? you just have to specify which one you want e.g.: string.Join("", str.Split((string[])null, StringSplitOptions.RemoveEmptyEntries));. That is actually what your call to default does in this case since it returns null as well: it helps the compiler to decide which overload to pick. Hence my comment because the statement in your comment "Split needs a valid array and null will not do ..." is false. No big deal, just thought worth mentioning since Jake Drew asked how this worked. +1 for your answerButanol
Cool idea ... but i would do it as follows: string.Concat("H \ne llo Wor ld".Split())Stumper
michaelkrisper solution is very readable. I did a test and 'split/join' (162 milliseconds) performed better than 'split/concat' (180 milliseconds) for 10,000 iterations of the same string.Floppy
Just a heads up, using only the Split part will only remove whitespace from the first matchIntemperate
It's fascinating to me that this doesn't work if you use empty string "" instead of default(string[]). Using default(char[]) works equally well, as does (char[])null, since they both lead to the compiler using the versions of the function which accept string[]/ char[] as their first parameter... even "".ToCharArray() works like this!! But alas, not empty string "" directly, for some reason. Would love to know why the decision was made to make the implementation of the String version of the function different from the rest.Darlenedarline
Oh!! str.Split(null) splits on whitespace apparently (learned from this answer)! That means you don't even need to use the StringSplitOptions.RemoveEmptyEntries option, because the whitespace characters will be removed when splitting on them.Darlenedarline
U
65

Edit: This answer was made with older .NET versions. I bet that .NET 8 now has more efficient ways of handling this. Just a heads up :)

Building on Henks answer I have created some test methods with his answer and some added, more optimized, methods. I found the results differ based on the size of the input string. Therefore, I have tested with two result sets. In the fastest method, the linked source has a even faster way. But, since it is characterized as unsafe I have left this out.

Long input string results:

  1. InPlaceCharArray: 2021 ms (Sunsetquest's answer) - (Original source)
  2. String split then join: 4277ms (Kernowcode's answer)
  3. String reader: 6082 ms
  4. LINQ using native char.IsWhitespace: 7357 ms
  5. LINQ: 7746 ms (Henk's answer)
  6. ForLoop: 32320 ms
  7. RegexCompiled: 37157 ms
  8. Regex: 42940 ms

Short input string results:

  1. InPlaceCharArray: 108 ms (Sunsetquest's answer) - (Original source)
  2. String split then join: 294 ms (Kernowcode's answer)
  3. String reader: 327 ms
  4. ForLoop: 343 ms
  5. LINQ using native char.IsWhitespace: 624 ms
  6. LINQ: 645ms (Henk's answer)
  7. RegexCompiled: 1671 ms
  8. Regex: 2599 ms

Code:

public class RemoveWhitespace
{
    public static string RemoveStringReader(string input)
    {
        var s = new StringBuilder(input.Length); // (input.Length);
        using (var reader = new StringReader(input))
        {
            int i = 0;
            char c;
            for (; i < input.Length; i++)
            {
                c = (char)reader.Read();
                if (!char.IsWhiteSpace(c))
                {
                    s.Append(c);
                }
            }
        }

        return s.ToString();
    }

    public static string RemoveLinqNativeCharIsWhitespace(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveLinq(string input)
    {
        return new string(input.ToCharArray()
            .Where(c => !Char.IsWhiteSpace(c))
            .ToArray());
    }

    public static string RemoveRegex(string input)
    {
        return Regex.Replace(input, @"\s+", "");
    }

    private static Regex compiled = new Regex(@"\s+", RegexOptions.Compiled);
    public static string RemoveRegexCompiled(string input)
    {
        return compiled.Replace(input, "");
    }

    public static string RemoveForLoop(string input)
    {
        for (int i = input.Length - 1; i >= 0; i--)
        {
            if (char.IsWhiteSpace(input[i]))
            {
                input = input.Remove(i, 1);
            }
        }
        return input;
    }

    public static string StringSplitThenJoin(this string str)
    {
        return string.Join("", str.Split(default(string[]), StringSplitOptions.RemoveEmptyEntries));
    }

    public static string RemoveInPlaceCharArray(string input)
    {
        var len = input.Length;
        var src = input.ToCharArray();
        int dstIdx = 0;
        for (int i = 0; i < len; i++)
        {
            var ch = src[i];
            switch (ch)
            {
                case '\u0020':
                case '\u00A0':
                case '\u1680':
                case '\u2000':
                case '\u2001':
                case '\u2002':
                case '\u2003':
                case '\u2004':
                case '\u2005':
                case '\u2006':
                case '\u2007':
                case '\u2008':
                case '\u2009':
                case '\u200A':
                case '\u202F':
                case '\u205F':
                case '\u3000':
                case '\u2028':
                case '\u2029':
                case '\u0009':
                case '\u000A':
                case '\u000B':
                case '\u000C':
                case '\u000D':
                case '\u0085':
                    continue;
                default:
                    src[dstIdx++] = ch;
                    break;
            }
        }
        return new string(src, 0, dstIdx);
    }
}

Tests:

[TestFixture]
public class Test
{
    // Short input
    //private const string input = "123 123 \t 1adc \n 222";
    //private const string expected = "1231231adc222";

    // Long input
    private const string input = "123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222123 123 \t 1adc \n 222";
    private const string expected = "1231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc2221231231adc222";

    private const int iterations = 1000000;

    [Test]
    public void RemoveInPlaceCharArray()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveInPlaceCharArray(input);
        }

        stopwatch.Stop();
        Console.WriteLine("InPlaceCharArray: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveStringReader()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveStringReader(input);
        }

        stopwatch.Stop();
        Console.WriteLine("String reader: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinqNativeCharIsWhitespace()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinqNativeCharIsWhitespace(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ using native char.IsWhitespace: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveLinq()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveLinq(input);
        }

        stopwatch.Stop();
        Console.WriteLine("LINQ: " + stopwatch.ElapsedMilliseconds + " ms");
        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegex()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegex(input);
        }

        stopwatch.Stop();
        Console.WriteLine("Regex: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveRegexCompiled()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveRegexCompiled(input);
        }

        stopwatch.Stop();
        Console.WriteLine("RegexCompiled: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [Test]
    public void RemoveForLoop()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.RemoveForLoop(input);
        }

        stopwatch.Stop();
        Console.WriteLine("ForLoop: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }

    [TestMethod]
    public void StringSplitThenJoin()
    {
        string s = null;
        var stopwatch = Stopwatch.StartNew();
        for (int i = 0; i < iterations; i++)
        {
            s = RemoveWhitespace.StringSplitThenJoin(input);
        }

        stopwatch.Stop();
        Console.WriteLine("StringSplitThenJoin: " + stopwatch.ElapsedMilliseconds + " ms");

        Assert.AreEqual(expected, s);
    }
}

Edit: Tested a nice one liner from Kernowcode.

Unquestionable answered 20/5, 2016 at 13:23 Comment(0)
C
31

Just an alternative because it looks quite nice :) - NOTE: Henks answer is the quickest of these.

input.ToCharArray()
 .Where(c => !Char.IsWhiteSpace(c))
 .Select(c => c.ToString())
 .Aggregate((a, b) => a + b);

Testing 1,000,000 loops on "This is a simple Test"

This method = 1.74 seconds
Regex = 2.58 seconds
new String (Henks) = 0.82 seconds

Colostrum answered 28/11, 2013 at 5:28 Comment(4)
Why was this downvoted? It's perfectly acceptable, meets the requirements, works faster than the RegEx option and is very readable?Colostrum
because it can be written a lot shorter: new string(input.Where(c => !Char.IsWhiteSpace(c)).ToArray());Bedfellow
Might be true - but the answer still stands, is readable, faster than regex and produces the desired result. Many of the other answers are AFTER this one...therefore a downvote does not make sense.Colostrum
Is there a unit for "0.82"? Or is it a relative measure (82%)? Can you edit your answer to make it more clear?Alverson
C
26

I found a nice write-up on this on CodeProject by Felipe Machado (with help by Richard Robertson)

He tested ten different methods. This one is the fastest safe version...

public static string TrimAllWithInplaceCharArray(string str) {

    var len = str.Length;
    var src = str.ToCharArray();
    int dstIdx = 0;

    for (int i = 0; i < len; i++) {
        var ch = src[i];

        switch (ch) {

            case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':

            case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':

            case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':

            case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':

            case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

            default:
                src[dstIdx++] = ch;
                break;
        }
    }
    return new string(src, 0, dstIdx);
}

And the fastest unsafe version... (some inprovements by Sunsetquest 5/26/2021 )

public static unsafe void RemoveAllWhitespace(ref string str)
{
    fixed (char* pfixed = str)
    {
        char* dst = pfixed;
        for (char* p = pfixed; *p != 0; p++)
        {
            switch (*p)
            {
                case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
                case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
                case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
                case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
                case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
                continue;

                default:
                    *dst++ = *p;
                    break;
            }
        }

        uint* pi = (uint*)pfixed;
        ulong len = ((ulong)dst - (ulong)pfixed) >> 1;
        pi[-1] = (uint)len;
        pfixed[len] = '\0';
    }
}

There are also some nice independent benchmarks on Stack Overflow by Stian Standahl that also show how Felipe's function is about 300% faster than the next fastest function. Also, for the one I modified, I used this trick.

Cookout answered 21/5, 2016 at 21:25 Comment(9)
I've tried translating this to C++ but am a little stuck. Any ideas why my port might be failing? #42136422Audraaudras
I can't resist. Look in the comments section of the article you refer to. You will find me as "Basketcase Software". He and worked on this together for a while. I had completely forgotten about this when this problem came back up again. Thanks for good memories. :)Disturbed
And what if you want to remove extra WS only ? What about this #17770702 mod ?Candice
Fastest is a bit slower ;-) String as container perfoms better here (in app 4:15 to 3:55 => 8.5% less, but when left string 3:30 => 21.4% less and profiller shows around 50% spent in this method). So in real live string should be around 40% faster comparing to (slow) array conversion used here.Candice
The original string will be changed by the unsafe version!Hangdog
@Motlicek Petr - nice catch on the original value being changed. I changed it up a bit so it is clear to the programmer that it is changed. Also, I think it is even faster now because no "new string".Cookout
So many years later I find this reference to that article again :-) By the way Loudenvier (me) is Felipe Machado :-)Ganger
Hi @Ganger - I often run into my own code also. Except this is your code =\ (That I feel like you should get StackOverflow credit for by the way) Hopefully, a lot of people upvoted your codeproject from here. BTW - I have posted a lot on Codeproject also. (see Sunsetquest)Cookout
@Cookout I was really flattered to see your post here referencing and using "my" code from the Codeproject article . I do post it to SHARE it! In fact seeing the code posted here from someone else is even more satisfying! Thank you! (BTW, sometimes I start reading an answer for a question only to realize I'm reading one of my own answers from years ago. It's so funny)Ganger
K
15

If you need superb performance, you should avoid LINQ and regular expressions in this case. I did some performance benchmarking, and it seems that if you want to strip white space from beginning and end of the string, string.Trim() is your ultimate function.

If you need to strip all white spaces from a string, the following method works fastest of all that has been posted here:

    public static string RemoveWhitespace(this string input)
    {
        int j = 0, inputlen = input.Length;
        char[] newarr = new char[inputlen];

        for (int i = 0; i < inputlen; ++i)
        {
            char tmp = input[i];

            if (!char.IsWhiteSpace(tmp))
            {
                newarr[j] = tmp;
                ++j;
            }
        }
        return new String(newarr, 0, j);
    }
Keniakenilworth answered 31/12, 2013 at 13:8 Comment(2)
I'd be curious to know the details of your benchmarkings--not that I am skeptical, but I'm curious about the overhead involved with Linq. How bad was it?Marquisette
I haven't re-run all the tests, but I can remember this much: Everything that involved Linq was a lot slower than anything without it. All the clever usage of string/char functions and constructors made no percentual difference if Linq was used.Keniakenilworth
C
12

Regex is overkill; just use extension on string (thanks Henk). This is trivial and should have been part of the framework. Anyhow, here's my implementation:

public static partial class Extension
{
    public static string RemoveWhiteSpace(this string self)
    {
        return new string(self.Where(c => !Char.IsWhiteSpace(c)).ToArray());
    }
}
Caracalla answered 18/10, 2014 at 0:11 Comment(4)
this is basically an unnecessary answer (regex is overkill, but is a quicker solution than given one - and it is already accepted?)Unrivaled
How can you use Linq extension methods on a string? Can't figure out which using I am missing others than System.LinqVarioloid
Ok looks like this is not available in PCL, IEnumerable<char> is conditional in Microsoft String implementation... And I am using Profile259 which does not support this :)Varioloid
@Varioloid strings are collections of char, so linq should work by default.Renwick
K
5

I think alot of persons come here for removing spaces. :

string s = "my string is nice";
s = s.Replace(" ", "");
Kalindi answered 12/1, 2022 at 11:5 Comment(3)
The problem with this, is that a space can be written in many different ways as mentioned in other answers. This replace will work for 90%~ of the cases or so.Legal
its s.Replace()Kermie
I don't know how you'd have any way of knowing of how people find this question, but that's not important because all that matters is the question specifically asks how to remove all whitespace, which this answer fails to do.Chesterchesterfield
S
4

Here is a simple linear alternative to the RegEx solution. I am not sure which is faster; you'd have to benchmark it.

static string RemoveWhitespace(string input)
{
    StringBuilder output = new StringBuilder(input.Length);

    for (int index = 0; index < input.Length; index++)
    {
        if (!Char.IsWhiteSpace(input, index))
        {
            output.Append(input[index]);
        }
    }
    return output.ToString();
}
Skimp answered 2/6, 2011 at 19:50 Comment(0)
F
4

I needed to replace white space in a string with spaces, but not duplicate spaces. e.g., I needed to convert something like the following:

"a b   c\r\n d\t\t\t e"

to

"a b c d e"

I used the following method

private static string RemoveWhiteSpace(string value)
{
    if (value == null) { return null; }
    var sb = new StringBuilder();

    var lastCharWs = false;
    foreach (var c in value)
    {
        if (char.IsWhiteSpace(c))
        {
            if (lastCharWs) { continue; }
            sb.Append(' ');
            lastCharWs = true;
        }
        else
        {
            sb.Append(c);
            lastCharWs = false;
        }
    }
    return sb.ToString();
}
Fitment answered 5/10, 2014 at 0:42 Comment(0)
N
3

We can use:

public static string RemoveWhitespace(this string input)
{
    if (input == null)
        return null;

    return new string(input.ToCharArray()
        .Where(c => !Char.IsWhiteSpace(c))
        .ToArray());
}
Nazario answered 20/6, 2016 at 14:59 Comment(3)
This is almost exactly the same as Henk's answer above. The only difference is that you check for null.Screwed
Yes, check for null is importenteNazario
Maybe this should have just been a comment on his answer. I am glad you brought it up though. I didn't know extension methods could be called on null objects.Screwed
N
2

I assume your XML response looks like this:

var xml = @"<names>
                <name>
                    foo
                </name>
                <name>
                    bar
                </name>
            </names>";

The best way to process XML is to use an XML parser, such as LINQ to XML:

var doc = XDocument.Parse(xml);

var containsFoo = doc.Root
                     .Elements("name")
                     .Any(e => ((string)e).Trim() == "foo");
Nicknack answered 2/6, 2011 at 19:40 Comment(4)
Once I verify that a particular <name> tag has the proper value, I'm done. Wouldn't parsing the document have some overhead?Screwed
Sure, it has some overhead. But it has the benefit of being correct. A solution based e.g. on regex is much more difficult to get right. If you determine that a LINQ to XML solution is too slow, you can always replace it with something faster. But you should avoid hunting for the most efficient implementation before you know that the correct one is too slow.Nicknack
This is going to be running in my employer's backend servers. Lightweight is what I'm looking for. I don't want something that "just works" but is optimal.Screwed
LINQ to XML is one of the most lightweight ways to correctly work with XML in .NETNicknack
P
2

Using LINQ, you can write a readable method this way:

public static string RemoveAllWhitespaces(this string source)
{
    return string.IsNullOrEmpty(source) 
        ? source 
        : new string(source.Where(x => !char.IsWhiteSpace(x)).ToArray());
}
Plinth answered 4/8, 2020 at 11:7 Comment(0)
L
1

Here is yet another variant:

public static string RemoveAllWhitespace(string aString)
{
  return String.Join(String.Empty, aString.Where(aChar => aChar !Char.IsWhiteSpace(aChar)));
}

As with most of the other solutions, I haven't performed exhaustive benchmark tests, but this works well enough for my purposes.

Lascivious answered 18/6, 2015 at 19:49 Comment(0)
Z
1

The straightforward way to remove all whitespaces from a string, "example" is your initial string.

String.Concat(example.Where(c => !Char.IsWhiteSpace(c))
Zinn answered 2/8, 2022 at 23:6 Comment(0)
B
0

I have found different results to be true. I am trying to replace all whitespace with a single space and the regex was extremely slow.

return( Regex::Replace( text, L"\s+", L" " ) );

What worked the most optimally for me (in C++ cli) was:

String^ ReduceWhitespace( String^ text )
{
  String^ newText;
  bool    inWhitespace = false;
  Int32   posStart = 0;
  Int32   pos      = 0;
  for( pos = 0; pos < text->Length; ++pos )
  {
    wchar_t cc = text[pos];
    if( Char::IsWhiteSpace( cc ) )
    {
      if( !inWhitespace )
      {
        if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );
        inWhitespace = true;
        newText += L' ';
      }
      posStart = pos + 1;
    }
    else
    {
      if( inWhitespace )
      {
        inWhitespace = false;
        posStart = pos;
      }
    }
  }

  if( pos > posStart ) newText += text->Substring( posStart, pos - posStart );

  return( newText );
}

I tried the above routine first by replacing each character separately, but had to switch to doing substrings for the non-space sections. When applying to a 1,200,000 character string:

  • the above routine gets it done in 25 seconds
  • the above routine + separate character replacement in 95 seconds
  • the regex aborted after 15 minutes.
Borrego answered 2/2, 2015 at 21:53 Comment(0)
D
0

It's arguably not as inherently expressive as using Regex or using Char.IsWhiteSpace, but using String.Split(null) with String.Concat() might be the most concise version of this:

public static string RemoveWhitespace(this string input)
{
   if (input == null) return null; // String instances can contain a null value!

   return string.Concat(input.Split(null));
}

This leverages the Split(Char[]) overload of Split(), which accepts null for its only parameter, and interprets that value as "split on all whitespace" (the same outcome as if you used an empty char array or default(char[]) instead).

Internally, it uses Char.IsWhiteSpace to make the determination of whether it should split on a given character:

If the separator argument is null or contains no characters, the method treats white-space characters as the delimiters. White-space characters are defined by the Unicode standard, and the Char.IsWhiteSpace method returns true if a white-space character is passed to it.

Darlenedarline answered 26/5, 2023 at 23:40 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.