Replace multiple characters in a C# string
Asked Answered
P

16

228

Is there a better way to replace strings?

I am surprised that Replace does not take in a character array or string array. I guess that I could write my own extension but I was curious if there is a better built in way to do the following? Notice the last Replace is a string not a character.

myString.Replace(';', '\n').Replace(',', '\n').Replace('\r', '\n').Replace('\t', '\n').Replace(' ', '\n').Replace("\n\n", "\n");
Penance answered 1/9, 2011 at 1:40 Comment(1)
Believe it or not, this is actually one of the fastest solution, because the replace uses vectorization (CPU SIMD) and is hard to beat. It can be improved in term of allocations however, but only if your function often replaces. If you code is more a "security filter", keep it as is.Brock
A
261

You can use a replace regular expression.

s/[;,\t\r ]|[\n]{2}/\n/g
  • s/ at the beginning means a search
  • The characters between [ and ] are the characters to search for (in any order)
  • The second / delimits the search-for text and the replace text

In English, this reads:

"Search for ; or , or \t or \r or (space) or exactly two sequential \n and replace it with \n"

In C#, you could do the following: (after importing System.Text.RegularExpressions)

Regex pattern = new Regex("[;,\t\r ]|[\n]{2}");
pattern.Replace(myString, "\n");
Animated answered 1/9, 2011 at 1:42 Comment(5)
\t and \r are included in \s. So your regex is equivalent to [;,\s].Anticathode
And \s is actually equivalent to [ \f\n\r\t\v] so you are including some stuff there that wasn't in the original question. Addtionally, the original question asks for Replace("\n\n", "\n") which your regex doesn't handle.Anticathode
Please consider that for simple replace operations which are not configurable by a user, using regular expressions is not optimal as it's very slow compared to regular string operations, according to a first benchmark article I found when searching "c# regex performance replace" it's about 13 times slower.Allowedly
Ah regex, the hieroglyphs of power! The only issue I can see here is human readability of the regular expressions; many refuse to understand them. I've recently added a solution below for those looking for a less complex alternative.Conwell
So how do we write if we want to replace multiple characters with multiple characters?Christen
U
149

If you are feeling particularly clever and don't want to use Regex:

char[] separators = new char[]{' ',';',',','\r','\t','\n'};

string s = "this;is,\ra\t\n\n\ntest";
string[] temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
s = String.Join("\n", temp);

You could wrap this in an extension method with little effort as well.

Edit: Or just wait 2 minutes and I'll end up writing it anyway :)

public static class ExtensionMethods
{
   public static string Replace(this string s, char[] separators, string newVal)
   {
       string[] temp;

       temp = s.Split(separators, StringSplitOptions.RemoveEmptyEntries);
       return String.Join( newVal, temp );
   }
}

And voila...

char[] separators = new char[]{' ',';',',','\r','\t','\n'};
string s = "this;is,\ra\t\n\n\ntest";

s = s.Replace(separators, "\n");
Underset answered 1/9, 2011 at 3:15 Comment(4)
Very memory-inefficient, especially for larger strings.Cremator
@Cremator Lol... That's probably the first time I have ever heard anyone claim that the built-in string methods are less memory efficient than regular expressions.Underset
You're right. I should have measured before I posted that. I run benchmark and Regex.Replace is over 8x slower than multiple string.Replace calls in a row. and 4x slower than Split+Join. See gist.github.com/MarcinJuraszek/c1437d925548561ba210a1c6ed144452Cremator
Nice solution! just a small addon. Unfortunately, this will not work if you want the first character(s) to be replaced as well. Say you want to replace the 't' character in the example string. The Split method will just drop that 't' of the first word 'this' because it is a EmptyEntry. If you use the StringSplitOptions.None instead of RemoveEmptyEntries, Split will leave the entry and the Join method will add the separator character instead. Hope this helpsRiess
S
80

You could use Linq's Aggregate function:

string s = "the\nquick\tbrown\rdog,jumped;over the lazy fox.";
char[] chars = new char[] { ' ', ';', ',', '\r', '\t', '\n' };
string snew = chars.Aggregate(s, (c1, c2) => c1.Replace(c2, '\n'));

Here's the extension method:

public static string ReplaceAll(this string seed, char[] chars, char replacementCharacter)
{
    return chars.Aggregate(seed, (str, cItem) => str.Replace(cItem, replacementCharacter));
}

Extension method usage example:

string snew = s.ReplaceAll(chars, '\n');
Serb answered 1/9, 2011 at 3:50 Comment(0)
B
26

This is the shortest way:

myString = Regex.Replace(myString, @"[;,\t\r ]|[\n]{2}", "\n");
Bunk answered 5/8, 2012 at 6:36 Comment(1)
This one liner also helps when you need this in initializers.Chaplet
L
12

Ohhh, the performance horror! The answer is a bit outdated, but still...

public static class StringUtils
{
    #region Private members

    [ThreadStatic]
    private static StringBuilder m_ReplaceSB;

    private static StringBuilder GetReplaceSB(int capacity)
    {
        var result = m_ReplaceSB;

        if (null == result)
        {
            result = new StringBuilder(capacity);
            m_ReplaceSB = result;
        }
        else
        {
            result.Clear();
            result.EnsureCapacity(capacity);
        }

        return result;
    }


    public static string ReplaceAny(this string s, char replaceWith, params char[] chars)
    {
        if (null == chars)
            return s;

        if (null == s)
            return null;

        StringBuilder sb = null;

        for (int i = 0, count = s.Length; i < count; i++)
        {
            var temp = s[i];
            var replace = false;

            for (int j = 0, cc = chars.Length; j < cc; j++)
                if (temp == chars[j])
                {
                    if (null == sb)
                    {
                        sb = GetReplaceSB(count);
                        if (i > 0)
                            sb.Append(s, 0, i);
                    }

                    replace = true;
                    break;
                }

            if (replace)
                sb.Append(replaceWith);
            else
                if (null != sb)
                    sb.Append(temp);
        }

        return null == sb ? s : sb.ToString();
    }
}
Lassitude answered 9/6, 2015 at 10:34 Comment(0)
C
12

Strings are just immutable char arrays

You just need to make it mutable:

  • either by using StringBuilder
  • go in the unsafe world and play with pointers (dangerous though)

and try to iterate through the array of characters the least amount of times. Note the HashSet here, as it avoids to traverse the character sequence inside the loop. Should you need an even faster lookup, you can replace HashSet by an optimized lookup for char (based on an array[256]).

Example with StringBuilder

public static void MultiReplace(this StringBuilder builder, 
    char[] toReplace, 
    char replacement)
{
    HashSet<char> set = new HashSet<char>(toReplace);
    for (int i = 0; i < builder.Length; ++i)
    {
        var currentCharacter = builder[i];
        if (set.Contains(currentCharacter))
        {
            builder[i] = replacement;
        }
    }
}

Edit - Optimized version (only valid for ASCII)

public static void MultiReplace(this StringBuilder builder, 
    char[] toReplace,
    char replacement)
{
    var set = new bool[256];
    foreach (var charToReplace in toReplace)
    {
        set[charToReplace] = true;
    }
    for (int i = 0; i < builder.Length; ++i)
    {
        var currentCharacter = builder[i];
        if (set[currentCharacter])
        {
            builder[i] = replacement;
        }
    }
}

Then you just use it like this:

var builder = new StringBuilder("my bad,url&slugs");
builder.MultiReplace(new []{' ', '&', ','}, '-');
var result = builder.ToString();
Crosstie answered 12/8, 2017 at 12:59 Comment(2)
Remember that Strings are wchar_t in .net, you are replacing only a subset of all the possible characters (and you'll need 65536 bools to optimize that...)Matilda
More precisely, a char is a UTF-16 code unit, and a string "Represents text as a sequence of UTF-16 code units.". What if you want to replace 'x' with emojis? ;-)Faggot
L
9

I know this question is super old, but I want to offer 2 options that are more efficient:

1st off, the extension method posted by Paul Walls is good but can be made more efficient by using the StringBuilder class, which is like the string data type but made especially for situations where you will be changing string values more than once. Here is a version I made of the extension method using StringBuilder:

public static string ReplaceChars(this string s, char[] separators, char newVal)
{
    StringBuilder sb = new StringBuilder(s);
    foreach (var c in separators) { sb.Replace(c, newVal); }
    return sb.ToString();
}

I ran this operation 100,000 times and using StringBuilder took 73ms compared to 81ms using string. So the difference is typically negligible, unless you're running many operations or using a huge string.

Secondly, here is a 1 liner loop you can use:

foreach (char c in separators) { s = s.Replace(c, '\n'); }

I personally think this is the best option. It is highly efficient and doesn't require writing an extension method. In my testing this ran the 100k iterations in only 63ms, making it the most efficient. Here is an example in context:

string s = "this;is,\ra\t\n\n\ntest";
char[] separators = new char[] { ' ', ';', ',', '\r', '\t', '\n' };
foreach (char c in separators) { s = s.Replace(c, '\n'); }

Credit to Paul Walls for the first 2 lines in this example.

Lightweight answered 15/9, 2020 at 5:14 Comment(0)
B
6

You may also simply write these string extension methods, and put them somewhere in your solution:

using System.Text;

public static class StringExtensions
{
    public static string ReplaceAll(this string original, string toBeReplaced, string newValue)
    {
        if (string.IsNullOrEmpty(original) || string.IsNullOrEmpty(toBeReplaced)) return original;
        if (newValue == null) newValue = string.Empty;
        StringBuilder sb = new StringBuilder();
        foreach (char ch in original)
        {
            if (toBeReplaced.IndexOf(ch) < 0) sb.Append(ch);
            else sb.Append(newValue);
        }
        return sb.ToString();
    }

    public static string ReplaceAll(this string original, string[] toBeReplaced, string newValue)
    {
        if (string.IsNullOrEmpty(original) || toBeReplaced == null || toBeReplaced.Length <= 0) return original;
        if (newValue == null) newValue = string.Empty;
        foreach (string str in toBeReplaced)
            if (!string.IsNullOrEmpty(str))
                original = original.Replace(str, newValue);
        return original;
    }
}


Call them like this:

"ABCDE".ReplaceAll("ACE", "xy");

xyBxyDxy


And this:

"ABCDEF".ReplaceAll(new string[] { "AB", "DE", "EF" }, "xy");

xyCxyF

Buckjump answered 17/5, 2018 at 9:59 Comment(0)
M
5

A .NET Core version for replacing a defined set of string chars to a specific char. It leverages the recently introduced Span type and string.Create method.

The idea is to prepare a replacement array, so no actual comparison operations would be required for the each string char. Thus, the replacement process reminds the way a state machine works. In order to avoid initialization of all items of the replacement array, let's store oldChar ^ newChar (XOR'ed) values there, what gives the following benefits:

  • If a char is not changing: ch ^ ch = 0 - no need to initialize non-changing items
  • The final char can be found by XOR'ing: ch ^ repl[ch]:
    • ch ^ 0 = ch - not changed chars case
    • ch ^ (ch ^ newChar) = newChar - replaced char

So the only requirement would be to ensure that the replacement array is zero-ed when initialized. We'll be using ArrayPool<char> to avoid allocations each time the ReplaceAll method is called. And, in order to ensure that the arrays are zero-ed without expensive call to Array.Clear method, we'll be maintaining a pool dedicated for the ReplaceAll method. We'll be clearing the replacement array (exact items only) before returning it to the pool.

public static class StringExtensions
{
    private static readonly ArrayPool<char> _replacementPool = ArrayPool<char>.Create();

    public static string ReplaceAll(this string str, char newChar, params char[] oldChars)
    {
        // If nothing to do, return the original string.
        if (string.IsNullOrEmpty(str) ||
            oldChars is null ||
            oldChars.Length == 0)
        {
            return str;
        }

        // If only one character needs to be replaced,
        // use the more efficient `string.Replace`.
        if (oldChars.Length == 1)
        {
            return str.Replace(oldChars[0], newChar);
        }

        // Get a replacement array from the pool.
        var replacements = _replacementPool.Rent(char.MaxValue + 1);

        try
        {
            // Intialize the replacement array in the way that
            // all elements represent `oldChar ^ newChar`.
            foreach (var oldCh in oldChars)
            {
                replacements[oldCh] = (char)(newChar ^ oldCh);
            }

            // Create a string with replaced characters.
            return string.Create(str.Length, (str, replacements), (dst, args) =>
            {
                var repl = args.replacements;

                foreach (var ch in args.str)
                {
                    dst[0] = (char)(repl[ch] ^ ch);
                    dst = dst.Slice(1);
                }
            });
        }
        finally
        {
            // Clear the replacement array.
            foreach (var oldCh in oldChars)
            {
                replacements[oldCh] = char.MinValue;
            }

            // Return the replacement array back to the pool.
            _replacementPool.Return(replacements);
        }
    }
}
Maxwell answered 19/8, 2020 at 14:14 Comment(0)
V
4

Use RegEx.Replace, something like this:

  string input = "This is   text with   far  too   much   " + 
                 "whitespace.";
  string pattern = "[;,]";
  string replacement = "\n";
  Regex rgx = new Regex(pattern);
  string result = rgx.Replace(input, replacement);

Here's more info on this MSDN documentation for RegEx.Replace

Variety answered 1/9, 2011 at 1:51 Comment(0)
S
3

Performance-Wise this probably might not be the best solution but it works.

var str = "filename:with&bad$separators.txt";
char[] charArray = new char[] { '#', '%', '&', '{', '}', '\\', '<', '>', '*', '?', '/', ' ', '$', '!', '\'', '"', ':', '@' };
foreach (var singleChar in charArray)
{
   str = str.Replace(singleChar, '_');
}
Subinfeudate answered 18/5, 2017 at 9:24 Comment(0)
E
2
string ToBeReplaceCharacters = @"~()@#$%&amp;+,'&quot;&lt;&gt;|;\/*?";
string fileName = "filename;with<bad:separators?";

foreach (var RepChar in ToBeReplaceCharacters)
{
    fileName = fileName.Replace(RepChar.ToString(), "");
}
Edmondo answered 24/8, 2018 at 14:28 Comment(0)
F
1

I also fiddled around with that problem, and found that most of the solutions here are very slow. The fastest one was actually the LINQ + Aggregate method that dodgy_coder posted.

But I thought, well that might be also quite heavy in memory allocations depending upon how many old characters there are. So I came out with this:

The idea here is to have a cached replacement map of the old characters for the current thread, to safe allocations. And other than that just working with a character array of the input that later on is returned as string again. Whereas the character array is modified as less as possible.

[ThreadStatic]
private static bool[] replaceMap;
public static string Replace(this string input, char[] oldChars, char newChar)
{
    if (input == null) throw new ArgumentNullException(nameof(input));
    if (oldChars == null) throw new ArgumentNullException(nameof(oldChars));
    if (oldChars.Length == 1) return input.Replace(oldChars[0], newChar);
    if (oldChars.Length == 0) return input;

    replaceMap = replaceMap ?? new bool[char.MaxValue + 1];
    foreach (var oldChar in oldChars)
    {
        replaceMap[oldChar] = true;
    }

    try
    {
        var count = input.Length;
        var output = input.ToCharArray();
        for (var i = 0; i < count; i++)
        {
            if (replaceMap[input[i]])
            {
                output[i] = newChar;
            }
        }

        return new string(output);
    }
    finally
    {
        foreach (var oldChar in oldChars)
        {
            replaceMap[oldChar] = false;
        }
    }
}

For me this is at most two allocations for the actual input string to work on. A StringBuilder turned out to be much slower for me for some reasons. And it is 2 times faster than the LINQ variant.

From answered 13/7, 2021 at 15:12 Comment(0)
B
1

No "Replace" (Linq only):

    string myString = ";,\r\t \n\n=1;;2,,3\r\r4\t\t5  6\n\n\n\n7=";
    char NoRepeat = '\n';
    string ByeBye = ";,\r\t ";
    string myResult = myString.ToCharArray().Where(t => !"STOP-OUTSIDER".Contains(t))
                 .Select(t => "" + ( ByeBye.Contains(t) ? '\n' : t))
                  .Aggregate((all, next) => (
                      next == "" + NoRepeat && all.Substring(all.Length - 1) == "" + NoRepeat
                      ? all : all  + next ) );
Bangka answered 29/10, 2021 at 5:41 Comment(1)
I'm surprised that many of the posted solutions returned the incorrect results for input string ";,\r\t \n\n=1;;2,,3\r\r4\t\t5 6\n\n\n\n7=". For some reason, \n is repeated in the result, and/or the first character is not \n. (from the author .Replace("\n\n", "\n") )Bangka
S
0

Having built my own solution, and looking at the solution used here, I leveraged an answer that isn't using complex code and is generally efficient for most parameters.

  1. Cover base cases where other methods are more appropriate. If there are no chars to replacement, return the original string. If there is only one, just use the Replace method.
  2. Use a StringBuilder and initialize the capacity to the length of the original string. After all, the new string being built will have the same length of the original string if its just chars being replaced. This ensure only 1 memory allocation is used for the new string.
  3. Assuming that the 'char' length could be small or large will impact performance. Large collections are better with hashsets, while smaller collections are not. This is a near-perfect use case for Hybrid Dictionaries. They switch to using a Hash based lookup once the collection gets too large. However, we don't care about the value of the dictionary, so I just set it to "true".
  4. Have different methods for StringBuilder verse just a string will prevent unnecessary memory allocation. If its just a string, don't instantiate a StringBuilder unless the base cases were checked. If its already a StringBuilder, then perform the replacements and return the StringBuilder itself (as other StringBuilder methods like Append do).
  5. I put the replacement char first, and the chars to check at the end. This way, I can leverage the params keyword for easily passing additional strings. However, you don't have to do this if you prefer the other order.
namespace Test.Extensions
{
    public static class StringExtensions
    {
        public static string ReplaceAll(this string str, char replacementCharacter, params char[] chars)
        {
            if (chars.Length == 0)
                return str;

            if (chars.Length == 1)
                return str.Replace(chars[0], replacementCharacter);

            StringBuilder sb = new StringBuilder(str.Length);

            var searcher = new HybridDictionary(chars.Length);
            for (int i = 0; i < chars.Length; i++)
                searcher[chars[i]] = true;

            foreach (var c in str)
            {
                if (searcher.Contains(c))
                    sb.Append(replacementCharacter);
                else
                    sb.Append(c);
            }

            return sb.ToString();
        }

        public static StringBuilder ReplaceAll(this StringBuilder sb, char replacementCharacter, params char[] chars)
        {
            if (chars.Length == 0)
                return sb;

            if (chars.Length == 1)
                return sb.Replace(chars[0], replacementCharacter);

            var searcher = new HybridDictionary(chars.Length);
            for (int i = 0; i < chars.Length; i++)
                searcher[chars[i]] = true;

            for (int i = 0; i < sb.Length; i++)
            {
                var val = sb[i];
                if (searcher.Contains(val))
                    sb[i] = replacementCharacter;
            }

            return sb;
        }
    }
}
Sarinasarine answered 7/6, 2022 at 16:36 Comment(0)
I
0

Single pass replacement, of single or multiple values

public static class StringExtensions
{
    public static string ReplaceTokens(this string input, Dictionary<char, string> tokenReplacement)
    {
        if (string.IsNullOrEmpty(input) ||
            tokenReplacement is null ||
            tokenReplacement.Count == 0)
        {
            return input;
        }
        if (tokenReplacement.Count == 1)
        {
            return input.Replace(tokenReplacement.Keys.First().ToString(), tokenReplacement.Values.First());
        }
        StringBuilder output = new StringBuilder(input.Length);
        foreach (char c in input)
        {
            if (tokenReplacement.TryGetValue(c, out string token))
            {
                output.Append(token);
            }
            else
            {
                output.Append(c);
            }
        }

        return output.ToString();
    }
    public static string ReplaceTokens(this string input, char[] tokenReplacement, string replaceWith)
    {
        if (string.IsNullOrEmpty(input) ||
            tokenReplacement is null ||
            tokenReplacement.Length == 0)
        {
            return input;
        }

        if (tokenReplacement.Length == 1)
        {
            return input.Replace(tokenReplacement[0].ToString(), replaceWith ?? String.Empty);
        }
        var splits = input.Split(tokenReplacement);
        return string.Join(replaceWith ?? String.Empty, splits);
    }
}
Indemonstrable answered 17/7, 2023 at 13:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.