What is the fastest way to replace extra white spaces to one white space?
e.g.
from
foo bar
to
foo bar
What is the fastest way to replace extra white spaces to one white space?
e.g.
from
foo bar
to
foo bar
The fastest way? Iterate over the string and build a second copy in a StringBuilder
character by character, only copying one space for each group of spaces.
The easier to type Replace
variants will create a bucket load of extra strings (or waste time building the regex DFA).
Edit with comparison results:
Using http://ideone.com/NV6EzU, with n=50 (had to reduce it on ideone because it took so long they had to kill my process), I get:
Regex: 7771ms.
Stringbuilder: 894ms.
Which is indeed as expected, Regex
is horribly inefficient for something this simple.
Compiled
regex will execute as fast as anything you can write yourself –
Fusillade string resultString = string.Join(" ", sourceString.Split(' ').Where(s => s != ""));
? –
Waltner Split
is also ridiculously slow, you're putting pressure on the heap for no reason by allocating that array. –
Rhombohedral if
s which seems to be measurably faster than the sb version. –
Rifkin new StringBuilder()
in it, try moving it outside so it's only created once, and use Clear()
between runs. I have a feeling most of that difference is the repeated creation of the object. –
Rhombohedral You can use a regex:
static readonly Regex trimmer = new Regex(@"\s\s+");
s = trimmer.Replace(s, " ");
For added performance, pass RegexOptions.Compiled
.
\s
with a space: new Regex(@" +")
(two space characters) –
Fusillade A bit late, but I have done some benchmarking to get the fastest way to remove extra whitespaces. If there are any faster answers, I would love to add them.
Results:
Code:
public class RemoveExtraWhitespaces
{
public static string WithRegex(string text)
{
return Regex.Replace(text, @"\s+", " ");
}
public static string WithRegexCompiled(Regex compiledRegex, string text)
{
return compiledRegex.Replace(text, " ");
}
public static string NormalizeWhiteSpace(string input)
{
if (string.IsNullOrEmpty(input))
return string.Empty;
int current = 0;
char[] output = new char[input.Length];
bool skipped = false;
foreach (char c in input.ToCharArray())
{
if (char.IsWhiteSpace(c))
{
if (!skipped)
{
if (current > 0)
output[current++] = ' ';
skipped = true;
}
}
else
{
skipped = false;
output[current++] = c;
}
}
return new string(output, 0, current);
}
public static string NormalizeWhiteSpaceForLoop(string input)
{
int len = input.Length,
index = 0,
i = 0;
var src = input.ToCharArray();
bool skip = false;
char ch;
for (; i < len; i++)
{
ch = src[i];
switch (ch)
{
case '\u0020':
case '\u00A0':
case '\u1680':
case '\u2000':
case '\u2001':
case '\u2002':
case '\u2003':
case '\u2004':
case '\u2005':
case '\u2006':
case '\u2007':
case '\u2008':
case '\u2009':
case '\u200A':
case '\u202F':
case '\u205F':
case '\u3000':
case '\u2028':
case '\u2029':
case '\u0009':
case '\u000A':
case '\u000B':
case '\u000C':
case '\u000D':
case '\u0085':
if (skip) continue;
src[index++] = ch;
skip = true;
continue;
default:
skip = false;
src[index++] = ch;
continue;
}
}
return new string(src, 0, index);
}
}
Tests:
[TestFixture]
public class RemoveExtraWhitespacesTest
{
private const string _text = "foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo ";
private const string _expected = "foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo foo bar foobar moo ";
private const int _iterations = 10000;
[Test]
public void Regex()
{
var result = TimeAction("Regex", () => RemoveExtraWhitespaces.WithRegex(_text));
Assert.AreEqual(_expected, result);
}
[Test]
public void RegexCompiled()
{
var compiledRegex = new Regex(@"\s+", RegexOptions.Compiled);
var result = TimeAction("RegexCompiled", () => RemoveExtraWhitespaces.WithRegexCompiled(compiledRegex, _text));
Assert.AreEqual(_expected, result);
}
[Test]
public void NormalizeWhiteSpace()
{
var result = TimeAction("NormalizeWhiteSpace", () => RemoveExtraWhitespaces.NormalizeWhiteSpace(_text));
Assert.AreEqual(_expected, result);
}
[Test]
public void NormalizeWhiteSpaceForLoop()
{
var result = TimeAction("NormalizeWhiteSpaceForLoop", () => RemoveExtraWhitespaces.NormalizeWhiteSpaceForLoop(_text));
Assert.AreEqual(_expected, result);
}
public string TimeAction(string name, Func<string> func)
{
var timer = Stopwatch.StartNew();
string result = string.Empty; ;
for (int i = 0; i < _iterations; i++)
{
result = func();
}
timer.Stop();
Console.WriteLine(string.Format("{0}: {1} ms", name, timer.ElapsedMilliseconds));
return result;
}
}
switch
, no if
) that does only space, and seems to be faster than NormalizeWhiteSpaceForLoop
. –
Rifkin string q = " Hello how are you doing?";
string a = String.Join(" ", q.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries));
I use below methods - they handle all whitespace chars not only spaces, trim both leading and trailing whitespaces, remove extra whitespaces, and all whitespaces are replaced to space char (so we have uniform space separator). And these methods are fast.
public static String CompactWhitespaces( String s )
{
StringBuilder sb = new StringBuilder( s );
CompactWhitespaces( sb );
return sb.ToString();
}
public static void CompactWhitespaces( StringBuilder sb )
{
if( sb.Length == 0 )
return;
// set [start] to first not-whitespace char or to sb.Length
int start = 0;
while( start < sb.Length )
{
if( Char.IsWhiteSpace( sb[ start ] ) )
start++;
else
break;
}
// if [sb] has only whitespaces, then return empty string
if( start == sb.Length )
{
sb.Length = 0;
return;
}
// set [end] to last not-whitespace char
int end = sb.Length - 1;
while( end >= 0 )
{
if( Char.IsWhiteSpace( sb[ end ] ) )
end--;
else
break;
}
// compact string
int dest = 0;
bool previousIsWhitespace = false;
for( int i = start; i <= end; i++ )
{
if( Char.IsWhiteSpace( sb[ i ] ) )
{
if( !previousIsWhitespace )
{
previousIsWhitespace = true;
sb[ dest ] = ' ';
dest++;
}
}
else
{
previousIsWhitespace = false;
sb[ dest ] = sb[ i ];
dest++;
}
}
sb.Length = dest;
}
'\n'
, will be replaced with a space, ' '
. –
Koehn string text = "foo bar";
text = Regex.Replace(text, @"\s+", " ");
// text = "foo bar"
This solution works with spaces, tabs, and newline. If you want just spaces, replace '\s' with ' '.
I needed one of these for larger strings and came up with the routine below.
Any consecutive white-space (including tabs, newlines) is replaced with whatever is in normalizeTo
.
Leading/trailing white-space is removed.
It's around 8 times faster than a RegEx with my 5k->5mil char strings.
internal static string NormalizeWhiteSpace(string input, char normalizeTo = ' ')
{
if (string.IsNullOrEmpty(input))
return string.Empty;
int current = 0;
char[] output = new char[input.Length];
bool skipped = false;
foreach (char c in input.ToCharArray())
{
if (char.IsWhiteSpace(c))
{
if (!skipped)
{
if (current > 0)
output[current++] = normalizeTo;
skipped = true;
}
}
else
{
skipped = false;
output[current++] = c;
}
}
return new string(output, 0, skipped ? current - 1 : current);
}
string yourWord = "beep boop baap beep boop baap beep";
yourWord = yourWord .Replace(" ", " |").Replace("| ", "").Replace("|", "");
I've tried using StringBuilder to:
Here's the best balance of performance & readability I've found (using 100,000 iteration timing runs). Sometimes this tests faster than a less-legible version, at most 5% slower. On my small test string, regex takes 4.24x as much time.
public static string RemoveExtraWhitespace(string str)
{
var sb = new StringBuilder();
var prevIsWhitespace = false;
foreach (var ch in str)
{
var isWhitespace = char.IsWhiteSpace(ch);
if (prevIsWhitespace && isWhitespace)
{
continue;
}
sb.Append(ch);
prevIsWhitespace = isWhitespace;
}
return sb.ToString();
}
StringBuilder
initial capacity. –
Mccann It's not fast, but if simplicity helps, this works:
while (text.Contains(" ")) text=text.Replace(" ", " ");
This piece of code works good. I have not measure the performance.
string text = " hello - world, here we go !!! a bc ";
string.Join(" ", text.Split().Where(x => x != ""));
// Output
// "hello - world, here we go !!! a bc"
I've tried with an array and with no if
.
PS C:\dev\Spaces> dotnet run -c release
// .NETCoreApp,Version=v3.0
Seed=7, n=20, s.Length=2828670
Regex by SLaks 1407ms, len=996757
StringBuilder by Blindy 154ms, len=996757
Array 130ms, len=996757
NoIf 91ms, len=996757
All match!
private static string WithNoIf(string s)
{
var dst = new char[s.Length];
uint end = 0;
char prev = char.MinValue;
for (int k = 0; k < s.Length; ++k)
{
var c = s[k];
dst[end] = c;
// We'll move forward if the current character is not ' ' or if prev char is not ' '
// To avoid 'if' let's get diffs for c and prev and then use bitwise operatios to get
// 0 if n is 0 or 1 if n is non-zero
uint x = (uint)(' ' - c) + (uint)(' ' - prev); // non zero if any non-zero
end += ((x | (~x + 1)) >> 31) & 1; // https://mcmap.net/q/75504/-check-if-a-number-is-non-zero-using-bitwise-operators-in-c by ruslik
prev = c;
}
return new string(dst, 0, (int)end);
}
private static string WithArray(string s)
{
var dst = new char[s.Length];
int end = 0;
char prev = char.MinValue;
for (int k = 0; k < s.Length; ++k)
{
char c = s[k];
if (c != ' ' || prev != ' ') dst[end++] = c;
prev = c;
}
return new string(dst, 0, end);
}
public static void Main()
{
const int n = 20;
const int seed = 7;
string s = GetTestString(seed);
var fs = new (string Name, Func<string, string> Func)[]{
("Regex by SLaks", WithRegex),
("StringBuilder by Blindy", WithSb),
("Array", WithArray),
("NoIf", WithNoIf),
};
Console.WriteLine($"Seed={seed}, n={n}, s.Length={s.Length}");
var d = new Dictionary<string, string>(); // method, result
var sw = new Stopwatch();
foreach (var f in fs)
{
sw.Restart();
var r = "";
for( int i = 0; i < n; i++) r = f.Func(s);
sw.Stop();
d[f.Name] = r;
Console.WriteLine($"{f.Name,-25} {sw.ElapsedMilliseconds,4}ms, len={r.Length}");
}
Console.WriteLine(d.Values.All( v => v == d.Values.First()) ? "All match!" : "Not all match! BAD");
}
private static string GetTestString(int seed)
{
// by blindy from https://mcmap.net/q/74235/-c-fastest-way-to-remove-extra-white-spaces
var rng = new Random(seed);
// random 1mb+ string (it's slow enough...)
StringBuilder ssb = new StringBuilder(1 * 1024 * 1024);
for (int i = 0; i < 1 * 1024 * 1024; ++i)
if (rng.Next(5) == 0)
ssb.Append(new string(' ', rng.Next(20)));
else
ssb.Append((char)(rng.Next(128 - 32) + 32));
string s = ssb.ToString();
return s;
}
try this:
System.Text.RegularExpressions.Regex.Replace(input, @"\s+", " ");
A few requirements are not clear in this question which deserve some thought.
This is a very efficient version which replaces all white space with a single space and removes any leading and trailing white space prior to the for loop.
public static string WhiteSpaceToSingleSpaces(string input)
{
if (input.Length < 2)
return input;
StringBuilder sb = new StringBuilder();
input = input.Trim();
char lastChar = input[0];
bool lastCharWhiteSpace = false;
for (int i = 1; i < input.Length; i++)
{
bool whiteSpace = char.IsWhiteSpace(input[i]);
//Skip duplicate whitespace characters
if (whiteSpace && lastCharWhiteSpace)
continue;
//Replace all whitespace with a single space.
if (whiteSpace)
sb.Append(' ');
else
sb.Append(input[i]);
//Keep track of the last character's whitespace status
lastCharWhiteSpace = whiteSpace;
}
return sb.ToString();
}
I don't know if it's the fastest way but i use this and this is worked for me:
/// <summary>
/// Remove all extra spaces and tabs between words in the specified string!
/// </summary>
/// <param name="str">The specified string.</param>
public static string RemoveExtraSpaces(string str)
{
str = str.Trim();
StringBuilder sb = new StringBuilder();
bool space = false;
foreach (char c in str)
{
if (char.IsWhiteSpace(c) || c == (char)9) { space = true; }
else { if (space) { sb.Append(' '); }; sb.Append(c); space = false; };
}
return sb.ToString();
}
This is funny, but on my PC the below method is just as fast as Sergey Povalyaev's StringBulder approach - (~282ms for 1000 reps, 10k src strings). Not sure about memory usage though.
string RemoveExtraWhiteSpace(string src, char[] wsChars){
return string.Join(" ",src.Split(wsChars, StringSplitOptions.RemoveEmptyEntries));
}
Obviously it works okay with any chars - not just spaces.
Though this is not what the OP asked for - but if what you really need is to replace specific consecutive characters in a string with only one instance you can use this relatively efficient method:
string RemoveDuplicateChars(string src, char[] dupes){
var sd = (char[])dupes.Clone();
Array.Sort(sd);
var res = new StringBuilder(src.Length);
for(int i = 0; i<src.Length; i++){
if( i==0 || src[i]!=src[i-1] || Array.BinarySearch(sd,src[i])<0){
res.Append(src[i]);
}
}
return res.ToString();
}
public string GetCorrectString(string IncorrectString)
{
string[] strarray = IncorrectString.Split(' ');
var sb = new StringBuilder();
foreach (var str in strarray)
{
if (str != string.Empty)
{
sb.Append(str).Append(' ');
}
}
return sb.ToString().Trim();
}
I just whipped this up, haven't tested it yet though. But I felt this was elegant, and avoids regex:
/// <summary>
/// Removes extra white space.
/// </summary>
/// <param name="s">
/// The string
/// </param>
/// <returns>
/// The string, with only single white-space groupings.
/// </returns>
public static string RemoveExtraWhiteSpace(this string s)
{
if (s.Length == 0)
{
return string.Empty;
}
var stringBuilder = new StringBuilder();
var whiteSpaceCount = 0;
foreach (var character in s)
{
if (char.IsWhiteSpace(character))
{
whiteSpaceCount++;
}
else
{
whiteSpaceCount = 0;
}
if (whiteSpaceCount > 1)
{
continue;
}
stringBuilder.Append(character);
}
return stringBuilder.ToString();
}
Am I missing something here? I came up with this:
// Input: "HELLO BEAUTIFUL WORLD!"
private string NormalizeWhitespace(string inputStr)
{
// First split the string on the spaces but exclude the spaces themselves
// Using the input string the length of the array will be 3. If the spaces
// were not filtered out they would be included in the array
var splitParts = inputStr.Split(' ').Where(x => x != "").ToArray();
// Now iterate over the parts in the array and add them to the return
// string. If the current part is not the last part, add a space after.
for (int i = 0; i < splitParts.Count(); i++)
{
retVal += splitParts[i];
if (i != splitParts.Count() - 1)
{
retVal += " ";
}
}
return retVal;
}
// Would return "HELLO BEAUTIFUL WORLD!"
I know I am creating a second string here to return it as well as creating the splitParts array. Just figured this is pretty straight forward. Maybe I am not taking into account some of the potential scenarios.
I know this is really old, but the easiest way to compact whitespace (replace any recurring whitespace character with a single "space" character) is as follows:
public static string CompactWhitespace(string astring)
{
if (!string.IsNullOrEmpty(astring))
{
bool found = false;
StringBuilder buff = new StringBuilder();
foreach (char chr in astring.Trim())
{
if (char.IsWhiteSpace(chr))
{
if (found)
{
continue;
}
found = true;
buff.Append(' ');
}
else
{
if (found)
{
found = false;
}
buff.Append(chr);
}
}
return buff.ToString();
}
return string.Empty;
}
I'm not very familiar with C#, hence my code is not an elegant/most efficient one. I came here to find an answer that fits my use case, but I couldn't find one (or I couldn't figure out one).
For my use case, I needed to normalize all the White Spaces (WS:{space
, tab
, cr lf
}) with the following conditions:
tab
need to be retained in some cases (a tab separated file, for eg. and in that case repeated tabs also need to be preserved). But in most cases they have to be converted into spaces.So here's a sample input and an expected output (Disclaimer: my code is test only for this example)
Every night in my dreams I see you, I feel you
That's how I know you go on
Far across the distance and places between us
You have come to show you go on
to be converted into
Every night in my dreams I see you, I feel you
That's how I know you go on
Far across the distance and places between us
You have come to show you go on
Here's my code
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main(string text)
{
bool preserveTabs = false;
//[Step 1]: Clean up white spaces around the text
text = text.Trim();
//Console.Write("\nTrim\n======\n" + text);
//[Step 2]: Reduce repeated spaces to single space.
text = Regex.Replace(text, @" +", " ");
// Console.Write("\nNo repeated spaces\n======\n" + text);
//[Step 3]: Hande Tab spaces. Tabs needs to treated with care because
//in some files tabs have special meaning (for eg Tab seperated files)
if(preserveTabs)
{
text = Regex.Replace(text, @" *\t *", "\t");
}
else
{
text = Regex.Replace(text, @"[ \t]+", " ");
}
//Console.Write("\nTabs preserved\n======\n" + text);
//[Step 4]: Reduce repeated new lines (and other white spaces around them)
//into a single new line.
text = Regex.Replace(text, @"([\t ]*(\n)+[\t ]*)+", "\n");
Console.Write("\nClean New Lines\n======\n" + text);
}
}
See this code in action here: https://dotnetfiddle.net/eupjIU
What if you adjust famos algo - in this case to compare "similar" strings - case in-sensitive & do not care about multi spaces and can stand NULLs too. Do not trust benchmarks - this one was put into a data compare intensive task, aprox. 1/4GB data and speed-up is arround 100% (commented part vs this algo 5/10min) on whole action. Some of these here had less arround 30% difference. Would tell building best algo will need go to disassembly and check what will compiler do with in both release or debug build. Here also half simpler a fulltrim as answer to similar (C question), case sensitive yet.
public static bool Differs(string srcA, string srcB)
{
//return string.Join(" ", (a?.ToString()??String.Empty).ToUpperInvariant().Split(new char[0], StringSplitOptions.RemoveEmptyEntries).ToList().Select(x => x.Trim()))
// != string.Join(" ", (b?.ToString()??String.Empty).ToUpperInvariant().Split(new char[0], StringSplitOptions.RemoveEmptyEntries).ToList().Select(x => x.Trim()));
if (srcA == null) { if (srcB == null) return false; else srcA = String.Empty; } // A == null + B == null same or change A to empty string
if (srcB == null) { if (srcA == null) return false; else srcB = String.Empty; }
int dstIdxA = srcA.Length, dstIdxB = srcB.Length; // are there any remaining (front) chars in a string ?
int planSpaceA = 0, planSpaceB = 0; // state automaton 1 after non-WS, 2 after WS
bool validA, validB; // are there any remaining (front) chars in a array ?
char chA = '\0', chB = '\0';
spaceLoopA:
if (validA = (dstIdxA > 0)) {
chA = srcA[--dstIdxA];
switch (chA) {
case '!': case '"': case '#': case '$': case '%': case '&': case '\'': case '(': case ')': case '*': case '+': case ',': case '-':
case '.': case '/': case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case ':':
case ';': case '<': case '=': case '>': case '?': case '@': case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T':
case 'U': case 'V': case 'W': case 'X': case 'Y': case 'Z': case '[': case '\\': case ']': case '^': case '_': case '`': // a-z will be | 32 to Upper
case '{': case '|': case '}': case '~':
break; // ASCII except lowercase
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': case 'h': case 'i':
case 'j': case 'k': case 'l': case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z':
chA = (Char)(chA & ~0x20);
break;
case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
if (planSpaceA == 1) planSpaceA = 2; // cycle here to address multiple WS before non-WS part
goto spaceLoopA;
default:
chA = Char.ToUpper(chA);
break;
}}
spaceLoopB:
if (validB = (dstIdxB > 0)) { // 2nd string / same logic
chB = srcB[--dstIdxB];
switch (chB) {
case '!': case '"': case '#': case '$': case '%': case '&': case '\'': case '(': case ')': case '*': case '+': case ',': case '-':
case '.': case '/': case '0': case '1': case '2': case '3': case '4': case '5': case '6': case '7': case '8': case '9': case ':':
case ';': case '<': case '=': case '>': case '?': case '@': case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T':
case 'U': case 'V': case 'W': case 'X': case 'Y': case 'Z': case '[': case '\\': case ']': case '^': case '_': case '`': // a-z will be | 32 to Upper
break;
case '{': case '|': case '}': case '~':
break; // ASCII except lowercase
case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': case 'h': case 'i':
case 'j': case 'k': case 'l': case 'm': case 'n': case 'o': case 'p': case 'q': case 'r':
case 's': case 't': case 'u': case 'v': case 'w': case 'x': case 'y': case 'z':
chB = (Char)(chB & ~0x20);
break;
case '\u0020': case '\u00A0': case '\u1680': case '\u2000': case '\u2001':
case '\u2002': case '\u2003': case '\u2004': case '\u2005': case '\u2006':
case '\u2007': case '\u2008': case '\u2009': case '\u200A': case '\u202F':
case '\u205F': case '\u3000': case '\u2028': case '\u2029': case '\u0009':
case '\u000A': case '\u000B': case '\u000C': case '\u000D': case '\u0085':
if (planSpaceB == 1) planSpaceB = 2;
goto spaceLoopB;
default:
chB = Char.ToUpper(chB);
break;
}}
if (planSpaceA != planSpaceB) return true; // both should/not have space now (0 init / 1 last non-WS / 2 last was WS)
if (validA) { // some (non-WS) in A still
if (validB) {
if (chA != chB) return true; // both have another char to compare, are they different ?
} else return true; // not in B not - they are different
} else { // A done, current last pair equal => continue 2 never ending loop till B end (by WS only to be same)
if (!validB) return false; // done and end-up here without leaving by difference => both are same except some WSs arround
else return true; // A done, but non-WS remains in B - different
} // A done, B had no non-WS or non + WS last follow - never ending loop continue
planSpaceA = 1; planSpaceB = 1;
goto spaceLoopA; // performs better
}
}
you could use indexOf to first grab where the whitespace sequences start, then use replace method to change the white space to "". From there, you can use the index you grabbed and place one whitespace character in that spot.
String
instances. –
Fusillade For those who just want to copy-pase and go on:
private string RemoveExcessiveWhitespace(string value)
{
if (value == null) { return null; }
var builder = new StringBuilder();
var ignoreWhitespace = false;
foreach (var c in value)
{
if (!ignoreWhitespace || c != ' ')
{
builder.Append(c);
}
ignoreWhitespace = c == ' ';
}
return builder.ToString();
}
My version (improved from Stian's answer). Should be very fast.
public static string TrimAllExtraWhiteSpaces(this string input)
{
if (string.IsNullOrEmpty(input))
{
return input;
}
var current = 0;
char[] output = new char[input.Length];
var charArray = input.ToCharArray();
for (var i = 0; i < charArray.Length; i++)
{
if (!char.IsWhiteSpace(charArray[i]))
{
if (current > 0 && i > 0 && char.IsWhiteSpace(charArray[i - 1]))
{
output[current++] = ' ';
}
output[current++] = charArray[i];
}
}
return new string(output, 0, current);
}
There is no need for complex code! Here is a simple code that will remove any duplicates:
public static String RemoveCharOccurence(String s, char[] remove)
{
String s1 = s;
foreach(char c in remove)
{
s1 = RemoveCharOccurence(s1, c);
}
return s1;
}
public static String RemoveCharOccurence(String s, char remove)
{
StringBuilder sb = new StringBuilder(s.Length);
Boolean removeNextIfMatch = false;
foreach(char c in s)
{
if(c == remove)
{
if(removeNextIfMatch)
continue;
else
removeNextIfMatch = true;
}
else
removeNextIfMatch = false;
sb.Append(c);
}
return sb.ToString();
}
It's very simple, just use the .Replace()
method:
string words = "Hello world!";
words = words.Replace("\\s+", " ");
Output >>> "Hello world!"
Simplest way I can think of:
Text = Text.Replace("\<Space>\<Space>", "\<Space>").Replace("\<Space>\<Space>", "\<Space>");
// Replace 2 \<Space>s with 1 space, twice
© 2022 - 2024 — McMap. All rights reserved.