Replace Multiple String Elements in C#
Asked Answered
D

10

119

Is there a better way of doing this...

MyString.Trim().Replace("&", "and").Replace(",", "").Replace("  ", " ")
         .Replace(" ", "-").Replace("'", "").Replace("/", "").ToLower();

I've extended the string class to keep it down to one job but is there a quicker way?

public static class StringExtension
{
    public static string clean(this string s)
    {
        return s.Replace("&", "and").Replace(",", "").Replace("  ", " ")
                .Replace(" ", "-").Replace("'", "").Replace(".", "")
                .Replace("eacute;", "é").ToLower();
    }
}

Just for fun (and to stop the arguments in the comments) I've shoved a gist up benchmarking the various examples below.

https://gist.github.com/ChrisMcKee/5937656

The regex option scores terribly; the dictionary option comes up the fastest; the long winded version of the stringbuilder replace is slightly faster than the short hand.

Dichasium answered 24/8, 2009 at 9:25 Comment(6)
Based on what you have in your benchmarks it looks like the dictionary version isn't doing all of the replacements which I suspect is what is making it faster than the StringBuilder solutions.Hypopituitarism
@Hypopituitarism Hi from 2009; I added a comment below in April about that glaring mistake. The gist is updated though I skipped over D. The dictionary version is still faster.Dichasium
Possible duplicate of Alternative to String.Replace multiple times?Together
@TotZam at least check the dates before flagging things; this is from 2009 thats from 2012Dichasium
Since many answers here seem concerned with performance, I believe it should be pointed out Andrej Adamanko's answer is likely to be the fastest for many replacements; certainly faster than chaining .Replace() especially on a large input string as stated in his answer.Vidrine
Note that this and many answers may be inappropriate for arbitrary replacement strings. "abc".Replace("a", "c").Replace("c", "d") will produce "dbd" as opposed to the expected "cbd".Therianthropic
M
165

Quicker - no. More effective - yes, if you will use the StringBuilder class. With your implementation each operation generates a copy of a string which under circumstances may impair performance. Strings are immutable objects so each operation just returns a modified copy.

If you expect this method to be actively called on multiple Strings of significant length, it might be better to "migrate" its implementation onto the StringBuilder class. With it any modification is performed directly on that instance, so you spare unnecessary copy operations.

public static class StringExtention
{
    public static string clean(this string s)
    {
        StringBuilder sb = new StringBuilder (s);

        sb.Replace("&", "and");
        sb.Replace(",", "");
        sb.Replace("  ", " ");
        sb.Replace(" ", "-");
        sb.Replace("'", "");
        sb.Replace(".", "");
        sb.Replace("eacute;", "é");

        return sb.ToString().ToLower();
    }
}
Monosome answered 24/8, 2009 at 9:27 Comment(6)
For clarity the dictionary answer is the fastest https://mcmap.net/q/186457/-replace-multiple-string-elements-in-cDichasium
In your benchmark on gist.github.com/ChrisMcKee/5937656 the dictionary test is not complete: it does not do all replacements and " " replaces " ", not " ". Not doing all replacements could be the reason, why it's fastest in the benchmark. The regex replacement is not complete, either. But most importantly your string TestData is very short. Like the accepted answer states, the string has to be of significant length for the StringBuilder to be of advantage. Could you please repeat the benchmark with strings of 10kB, 100kB and 1MB?Karate
Its a good point; as it stands it was being used for url cleansing so testings at 100kb - 1mb would have been unrealistic. I will update the benchmark so its using the whole thing though, that was a mistake.Dichasium
For best performance, loop over the characters and replace them yourself. However that can be tedious if you have more than single characters strings (find them enforces you to compare multiple characters at once, while replacing them requires allocating more memory and moving the rest of the string).Mazurka
When none of the characters or strings to be replaced occur in the input string, this will be a very bad solution. In that case String.Replace would just return the original reference and be dirt cheap compared to the StringBuilder solution.Beguin
The isolated ToLower call at the end will force another allocation of the whole string. Any suggestions to prevent that memory waste as well that still looks readable?Bentonbentonite
F
32

If you are simply after a pretty solution and don't need to save a few nanoseconds, how about some LINQ sugar?

var input = "test1test2test3";
var replacements = new Dictionary<string, string> { { "1", "*" }, { "2", "_" }, { "3", "&" } };

var output = replacements.Aggregate(input, (current, replacement) => current.Replace(replacement.Key, replacement.Value));
Fides answered 6/5, 2014 at 2:21 Comment(3)
Similar to example C in the Gist (if you look above it the uglier linq statement is in the comment)Dichasium
Interesting that you define a functional statment as "Uglier" than a procedural one.Fides
not going to argue about it; its merely preference. As you say, linq is simply syntactic sugar; and as I said I'd already put the equivalent above the code :)Dichasium
M
18

this will be more efficient:

public static class StringExtension
{
    public static string clean(this string s)
    {
        return new StringBuilder(s)
              .Replace("&", "and")
              .Replace(",", "")
              .Replace("  ", " ")
              .Replace(" ", "-")
              .Replace("'", "")
              .Replace(".", "")
              .Replace("eacute;", "é")
              .ToString()
              .ToLower();
    }
}
Mandate answered 24/8, 2009 at 9:31 Comment(2)
Really hard to read. I am sure you know what it does but a Junior Dev will scratch his head at what actually goes on. I agree- I also always look for the shortes hand of writting something- But it was only for my own satisfaction. Other people were freaking out at the pile of mess.Crabby
This is actually slower. BenchmarkOverhead... 13ms StringClean-user151323... 2843ms StringClean-TheVillageIdiot... 2921ms Varies on reruns but the answer wins gist.github.com/anonymous/5937596Dichasium
S
11

Maybe a little more readable?

    public static class StringExtension {

        private static Dictionary<string, string> _replacements = new Dictionary<string, string>();

        static StringExtension() {
            _replacements["&"] = "and";
            _replacements[","] = "";
            _replacements["  "] = " ";
            // etc...
        }

        public static string clean(this string s) {
            foreach (string to_replace in _replacements.Keys) {
                s = s.Replace(to_replace, _replacements[to_replace]);
            }
            return s;
        }
    }

Also add New In Town's suggestion about StringBuilder...

Strophic answered 24/8, 2009 at 9:33 Comment(3)
It would be more readable like this: private static Dictionary<string, string> _replacements = new Dictionary<string, string>() { {"&", "and"}, {",", ""}, {" ", " "} /* etc */ };Windswept
or of course... private static readonly Dictionary<string, string> Replacements = new Dictionary<string, string>() { { "&", "and" }, { ",", "" }, { " ", " " } /* etc */ }; public static string Clean(this string s) { return Replacements.Keys.Aggregate(s, (current, toReplace) => current.Replace(toReplace, Replacements[toReplace])); }Dichasium
-1 : Using a Dictionary doesen't make any sence here. Just use a List<Tuple<string,string>>. This also changes the order of the replacings is taken AND is not as fast as e.g. s.Replace("a").Replace("b").Replace("c"). Don't use this!Teamster
M
10

There is one thing that may be optimized in the suggested solutions. Having many calls to Replace() makes the code to do multiple passes over the same string. With very long strings the solutions may be slow because of CPU cache capacity misses. May be one should consider replacing multiple strings in a single pass.

The essential content from that link:

static string MultipleReplace(string text, Dictionary<string, string> replacements) {
    return Regex.Replace(text,
                 "(" + String.Join("|", replacements.Keys) + ")",
                 delegate(Match m) { return replacements[m.Value]; });
}
    // somewhere else in code
    string temp = "Jonathan Smith is a developer";

    var adict = new Dictionary<string, string>();
    adict.Add("Jonathan", "David");
    adict.Add("Smith", "Seruyange");

    string rep = MultipleReplace(temp, adict);
Middleoftheroad answered 21/5, 2014 at 13:21 Comment(3)
A lot of answers seem concerned about performance, in which case this is the best. And it's simple because it's just a documented overload of String.Replace where you return an expected value based on the match, in this example, using a dictionary to match them up. Should be simple to understand.Vidrine
Added code from linked page to prevent this answer becoming useless if linked page diesTwopiece
Please correct your answer so the code actually works. In your MultipleReplace method there is no variable called 'adict', change it to 'replacements and remove the .ToArray(). Indent code correctly to make it more readable. Thank you for the solution.Reconnoiter
B
4

Another option using linq is

[TestMethod]
public void Test()
{
  var input = "it's worth a lot of money, if you can find a buyer.";
  var expected = "its worth a lot of money if you can find a buyer";
  var removeList = new string[] { ".", ",", "'" };
  var result = input;

  removeList.ToList().ForEach(o => result = result.Replace(o, string.Empty));

  Assert.AreEqual(expected, result);
}
Braw answered 9/11, 2017 at 11:15 Comment(3)
You may declare var removeList = new List<string> { /*...*/ }; then just call removeList.ForEach( /*...*/ ); and simplify your code. Note also that it doesn't fully answer the question because all found strings are replaced with String.Empty.Diplegia
Where precisely is Linq used? This wastefully converts removeList to a List, for the unnecessary goal of making it a single line. But Lamdas and Linq aren't synonymous.Bessie
Note, List.ForEach is not a LINQ thing, it's a List thingTwopiece
T
0

I'm doing something similar, but in my case I'm doing serialization/De-serialization so I need to be able to go both directions. I find using a string[][] works nearly identically to the dictionary, including initialization, but you can go the other direction too, returning the substitutes to their original values, something that the dictionary really isn't set up to do.

Edit: You can use Dictionary<Key,List<Values>> in order to obtain same result as string[][]

Tenon answered 16/9, 2011 at 22:57 Comment(1)
This doesn't appear to provide an answer to the questionTwopiece
S
0

Regular Expression with MatchEvaluator could also be used:

    var pattern = new Regex(@"These|words|are|placed|in|parentheses");
    var input = "The matching words in this text are being placed inside parentheses.";
    var result = pattern.Replace(input , match=> $"({match.Value})");

Note:

  • Obviously different expression (like: \b(\w*test\w*)\b) could be used for words matching.
  • I was hoping it to be more optimized to find the pattern in expression and do the replacements
  • The advantage is the ability to process the matching elements while doing the replacements
Spencerianism answered 23/9, 2021 at 22:20 Comment(1)
This answer would be improved by showing a better use of the match delegate than simply supplying the same value that was matched; it's a non opTwopiece
W
0

This is essentially Paolo Tedesco's answer, but I wanted to make it re-usable.

    public class StringMultipleReplaceHelper
    {
        private readonly Dictionary<string, string> _replacements;

        public StringMultipleReplaceHelper(Dictionary<string, string> replacements)
        {
            _replacements = replacements;
        }

        public string clean(string s)
        {
            foreach (string to_replace in _replacements.Keys)
            {
                s = s.Replace(to_replace, _replacements[to_replace]);
            }
            return s;
        }
    }

One thing to note that I had to stop it being an extension, remove the static modifiers, and remove this from clean(this string s). I'm open to suggestions as to how to implement this better.

Waiter answered 22/2, 2022 at 11:27 Comment(0)
M
-1
string input = "it's worth a lot of money, if you can find a buyer.";
for (dynamic i = 0, repl = new string[,] { { "'", "''" }, { "money", "$" }, { "find", "locate" } }; i < repl.Length / 2; i++) {
    input = input.Replace(repl[i, 0], repl[i, 1]);
}
Microbalance answered 16/3, 2017 at 0:23 Comment(1)
You should consider adding context to your answers. Like a brief explanation of what's it doing And, if relevant, why you wrote it the way you did.Levalloisian

© 2022 - 2024 — McMap. All rights reserved.