Google Sheets multiple search and replace from a list
Asked Answered
W

9

16

I am looking for a solution to search for certain strings in a Google Sheet and, when found, replace them with another string from a list in another sheet.

For better understanding, I prepared a Sheet for you:

https://docs.google.com/a/vicampo.de/spreadsheets/d/1mETtAY72K6ST-hg1qOU9651265nGq0qvcgvzMRqHDO8/edit?usp=sharing

So here's the exact task I want to achieve:

In every single cell in column A of sheet "Text", look for the strings given in column A in sheet "List" and, when found, replace it with the corresponding string in column B of the sheet "List".

See my Example: Look in cell A1 for the string "Lorem" and replace it with "Xlorem", then look for the string "Ipsum" and replace it with "Xipsum", then look for the string "amet" and replace it with "Xamet" then move on to cell B1 and start again looking for the strings...

I have tried different functions and managed to do this with a function for one cell. But how to do it in a loop?

Thanks everyone who is interested in helping out with this problem!

Whittier answered 2/4, 2015 at 10:50 Comment(2)
Your spreadsheet is currently not accessible....Hadhramaut
Thanks JPV, it should be working now. Sorry!Whittier
H
11

Although there must be 'nicer' solutions, a quick solution (as long is the number of cells with the words you want replaced is not too long), would be:

=ArrayFormula(regexreplace(regexreplace(regexreplace(A1:A; List!A1; List!B1); List!A2; List!B2); List!A3; List!B3))
Hadhramaut answered 2/4, 2015 at 12:17 Comment(6)
Hello JPV, first - sorry for the easter delay :) Thank you very much for your solution, it works fine. It used it with up to six pairs of words to replace now, performance is still great. So thanks a lot, that saved me tons of work!Whittier
Glad to hear, Christoph.Hadhramaut
I had the same problem, and this was a great answer. Thanks!Naldo
Brute force, but it works. I was able to search & replace 15,000 cells from an 85-length list in about a minute. (The formula was assembled using a quick script.) The only problem is that the list can't contain any regex-interfering symbols such as parentheses.Psalm
Turns out that SUBSTITUTE is way faster and safer for this task. I've added my own answer.Psalm
This works. Have my upvote. If you want to pick up an easy 100 points, claim the bounty I've put on this question here: webapps.stackexchange.com/questions/137244/…. To be awarding the bounty you need to expand your answer to show why it works. (It's Webapps...) You may want to revise to Substitute instead of Regexreplace for straight forward use.Obara
C
11

Probably the best for you, in this case, should be creating a new function to your Google Spreadsheet. It tends to be, in the general case, more simple, clear and powerfull than that kind of complex formulas that should do the same.

In this particular case, I have the same problem, so you can use the same function:

Click on "Tools" menu, then click on the "Script Editor" option. Into the script editor, erase the draft and paste this function:

function preg_quote( str ) {
  // http://kevin.vanzonneveld.net
  // +   original by: booeyOH
  // +   improved by: Ates Goral (http://magnetiq.com)
  // +   improved by: Kevin van Zonneveld (http://kevin.vanzonneveld.net)
  // +   bugfixed by: Onno Marsman
  // *     example 1: preg_quote("$40");
  // *     returns 1: '\$40'
  // *     example 2: preg_quote("*RRRING* Hello?");
  // *     returns 2: '\*RRRING\* Hello\?'
  // *     example 3: preg_quote("\\.+*?[^]$(){}=!<>|:");
  // *     returns 3: '\\\.\+\*\?\[\^\]\$\(\)\{\}\=\!\<\>\|\:'

  return (str+'').replace(/([\\\.\+\*\?\[\^\]\$\(\)\{\}\=\!\<\>\|\:])/g, "\\$1");
}

function ARRAYREPLACE(input,fromList,toList,caseSensitive){
  /* default behavior it is not case sensitive */
  if( caseSensitive === undefined ){
    caseSensitive = false;
  }
  /* if the from list it is not a list, become a list */
  if( typeof fromList != "object" ) {
    fromList = [ fromList ];
  }
  /* if the to list it is not a list, become a list */
  if( typeof toList != "object" ) {
    toList = [ toList ];
  }
  /* force the input be a string */
  var result = input.toString();

  /* iterates using the max size */
  var bigger  = Math.max( fromList.length, toList.length) ;

  /* defines the words separators */
  var arrWordSeparator = [ ".", ",", ";", " " ];

  /* interate into the lists */
  for(var i = 0; i < bigger; i++ ) {
    /* get the word that should be replaced */
    var fromValue = fromList[ ( i % ( fromList.length ) ) ]
    /* get the new word that should replace */
    var toValue = toList[ ( i % ( toList.length ) ) ]

    /* do not replace undefined */
    if ( fromValue === undefined ) {
      continue;
    }
    if ( toValue == undefined ) {
      toValue = "";
    }

    /* apply case sensitive rule */
    var caseRule = "g";
    if( !caseSensitive ) {
      /* make the regex case insensitive */
      caseRule = "gi";
    }

    /* for each end word char, make the replacement and update the result */
    for ( var j = 0; j < arrWordSeparator.length; j++ ) {
      
      /* from value being the first word of the string */
      result =  result.replace( new RegExp( "^(" + preg_quote( fromValue + arrWordSeparator[ j ] ) + ")" , caseRule ), toValue + arrWordSeparator[ j ] );
      
      /* from value being the last word of the string */
      result =  result.replace( new RegExp( "(" + preg_quote( arrWordSeparator[ j ] + fromValue ) + ")$" , caseRule ), arrWordSeparator[ j ] + toValue );
      
      /* from value in the middle of the string between two word separators */
      for ( var k = 0; k < arrWordSeparator.length; k++ ) {
        result =  result.replace( 
          new RegExp( 
            "(" + preg_quote( arrWordSeparator[ j ] + fromValue + arrWordSeparator[ k ] ) + ")" , 
            caseRule 
          ), 
          /* need to keep the same word separators */
          arrWordSeparator[ j ] + toValue + arrWordSeparator[ k ] 
        );
      }
    }
    
    /* from value it is the only thing in the string */
    result =  result.replace( new RegExp( "^(" + preg_quote( fromValue ) + ")$" , caseRule ), toValue );
  }
  /* return the new result */
  return result;
}

Just save your script and the new function it will be available to you. Now, you have the function that replaces all the first values list by the second value list.

=ARRAYREPLACE(C2;A1:A4;B1:B4)

for example, takes the C2 text and replaces all the elements found in the A1:A4 list by the equivalent into the B1:B4 list.

example of the use of the function into Google Spreadsheet

Caveat answered 7/12, 2017 at 0:55 Comment(7)
Thank you so much! Just want to add that to make this work I had to replace the comma by semicolons, as it didn't work (took me an hour to figure out!). So the formula would be like this: =ARRAYREPLACE(C2;A1:A4;B1:B4)Praefect
@Thiago Mata should not the undefined by written like this instead? if ( fromValue === "undefined" ) {Fiji
Probably. I would check first if the Javascript inside of Google Docs accepts that. But, if it does, it seems a good improvement.Caveat
Sorry, looking this closely, did you mean fromValue === undefined? If so, it makes sense. We use a more restricted comparator with the undefined constant value. But using the '===' with the string value seems wrong. Because for sure it will not match the string "undefined", fromValue === "undefined" will always be false.Caveat
Here is a running example docs.google.com/spreadsheets/d/…Caveat
Script updated with the suggested improvements.Caveat
Thanks! As of 2022, Google has changed the menu layout to create scripts, and in particular there is no longer a "Script Editor" under "Tools". To get into the editor mode, I recorded a macro (=a script; see "Extensions" menu), edited it to replace its code by yours instead, and then used "Extensions"->"Import macro" to activate the function. There might be more direct ways to get to the macro editing UI, but if you have never made any macro, the "Manage macros" menu entry is inactive.Contravention
A
8

Copy Sample File With Explanation

enter image description here

Problem

enter image description here

The challenge is: Find & Replace multiple values in the input of multiple cells.

ArrayFormula's

Solutions which I account as Array-Solution must be:

  1. based on open ranges
  2. no need to drag the formula down
  3. no need to modify the formula when new items in lists appear

These tests must be passed:

  • Is ArrayFormula
  • User can set Case Sensitivity
  • Replaces Emojis
  • Replaces Special Chars $\[]. etc.
  • CrashTest. Works for 10K rows of data
  • CrashTest. Works for 2K replacements

Script

I recommend using the not-regex-based script in this case. This algorithm finds and replaces text by chars:

Usage

Use as a regular formula from sheet:

=substitutes(A12:A;List!A1:B)

enter image description here

Code

Save this code to use the formula above:

/**
 * Substitutes in every entry in array
 * Text from prefilled array
 *
 * @param {array} input The array of strings.
 * @param {array} subTable The array of string pairs: search texts / replace texts.
 * @param {boolean} caseSensitive [optional=false] 
 * TRUE to match Apple and apple as different words
 * @return The input with all replacement made
 * @customfunction
 */
function substitutes(input, subTable,caseSensitive) {
  //  default behavior it is not case sensitive
  caseSensitive = caseSensitive || false;
  // if the input is not a list, become a list */
  if( typeof input != "object" ) {
    input = [ input ];
  }
  var res = [], text;
  for (var i = 0; i < input.length; i++) {
    // force each array element in the input be a string
    text = input[i].toString();
    for (var ii = 0; ii < subTable.length; ii++) {
      text = replaceAll_(
        text, 
        subTable[ii][0], 
        subTable[ii][1], 
        caseSensitive);
    }
    res.push(text);
  }
  return res;
}


/***
 * JavaScript Non-regex Replace
 * 
 * Original code sourse:
 * https://mcmap.net/q/593774/-javascript-non-regex-replace
 */
function replaceAll_(str, find, newToken, caseSensitive) {
    var i = -1;
    // sanity check & defaults
    if (!str) {
        // Instead of throwing, act as 
        // COALESCE if find == null/empty and str == null
        if ((str == null) && (find == null))
            return newToken;
        return str;
    }
    if (!find || find === ''){ return str; }
    if (find === newToken) { return str; }
    caseSensitive = caseSensitive || false;
    find = !caseSensitive ? find.toLowerCase() : find;
    // search process, search by char
    while ((
        i = (!caseSensitive ? str.toLowerCase() : str).indexOf(
            find, i >= 0 ? i + newToken.length : 0
        )) !== -1
    ) {
        str = str.substring(0, i) +
            newToken +
            str.substring(i + find.length);
    } 
    return str;
}

Monster Formula

I've used the RegEx algorithm to solve it with native functions. This method is not recommended as it slows down your Worksheet.

The formula is:

=INDEX(SUBSTITUTE(REGEXREPLACE(TRANSPOSE(QUERY(TRANSPOSE(IFERROR(SPLIT(SUBSTITUTE(TRANSPOSE(QUERY(TRANSPOSE(IFERROR(VLOOKUP(SPLIT(REGEXREPLACE(A12:A;SUBSTITUTE(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1");"𑇡";"(.*)");INDEX(REGEXREPLACE(TRIM(TRANSPOSE(QUERY(TRANSPOSE(IF(SEQUENCE(COUNTA(INDEX(LEN(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"[^𑇡]";""))/2));MAX(INDEX(LEN(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"[^𑇡]";""))/2)))-(SEQUENCE(COUNTA(INDEX(LEN(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"[^𑇡]";""))/2)))-1)*MAX(INDEX(LEN(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"[^𑇡]";""))/2))<=INDEX(LEN(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"[^𑇡]";""))/2);"𑇣"&SEQUENCE(COUNTA(INDEX(LEN(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"[^𑇡]";""))/2));MAX(INDEX(LEN(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"[^𑇡]";""))/2)))-(SEQUENCE(COUNTA(INDEX(LEN(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"[^𑇡]";""))/2)))-1)*MAX(INDEX(LEN(REGEXREPLACE(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡");"[^𑇡]";""))/2))&"𑇤";));;2^99)));" ?𑇣";"$")));"𑇤");{List!A1:A\List!B1:B};2;)&"𑇩"));;2^99));"𑇩 ";"𑇩")&"𝅘";"𑇩")&SPLIT(REGEXREPLACE(A12:A;"(?i)"&SUBSTITUTE(SUBSTITUTE(QUERY(FILTER(REGEXREPLACE(List!A1:A;"(\\|\+|\*|\?|\[|\^|\]|\$|\(|\)|\{|\}|\=|\!|\<|\>|\||\:|\-)";"\\$1")&"𑇦";List!A1:A<>"");;2^99);"𑇦 ";"|");"𑇦";"");"𑇡")&"𝅘";"𑇡")))&"𝅗";;2^99));"𝅗 *";"");"𝅘";""))

Other Solutions

Nested formulas

Nested SUBSTITUTE or REGEXREPLACE formulas as was noted in other answers.

Formulas you need to drag down for the result

Here's a sample formula. Basic logic - split the text into parts → modify parts individually → to join the new result.

This formula must be copied down:

=JOIN(" ";
ArrayFormula(
IFERROR(VLOOKUP(TRANSPOSE(SPLIT(A1;" "));List!A:B;2;0);TRANSPOSE(SPLIT(A1;" ")))))
Auer answered 26/2, 2016 at 12:53 Comment(2)
this is such a thorough detailed answer, it should be accepted and upvoted much more! the script works perfectly for me!Illogic
I'm in an unfortunate situation where I cannot use your amazing LAMBDA/REGEX solution due to non-escaped characters and reaching the time limit AND I cannot use the above NON-REGEX solution as it doesn't support whole-word replacement, which a REGEX solution would allow for. How can the above NON-REGEX solution be modified to support whole-word replacements? For example, your [Test]10 (cell F20) string replacement would be invalid with whole-word replacement support. I assume if the next character is alphanumeric, then it shouldn't be replaced???Demythologize
P
5

An improvement on JPV's answer, which is orders of magnitude faster and works with arbitrary query and replacement strings:

=ArrayFormula(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1:A, List!A1, List!B1), List!A2, List!B2), List!A3, List!B3))

Using this format, a 15,000 cell spreadsheet with an 85-length replacement list will update in just a few seconds. Simply assemble the formula string using your scripting language of choice and you're good to go!

Psalm answered 11/6, 2018 at 21:56 Comment(0)
A
5

With new Labmda and Friends:

=LAMBDA(data,re,with,BYROW(data,LAMBDA(r,if(r="","",REDUCE(r,SEQUENCE(counta(re)),LAMBDA(ini,v,REGEXREPLACE(ini,INDEX(re,v),INDEX(with,v))))))))(C5:C6,E5:E7,F5:F7)

=> Named function

=SUBSTITUTES_RE(list0,list_re,list_with)

↑ This will substitute using regular expressions


substututes

Definition is the same, but REGEXREPLACE is replaced with SUBSTITUTE


Other examples here: https://docs.google.com/spreadsheets/d/1IMymPZlibT6DX4yzDX4OXj2XBZ48zEl6vBUzIHJIzVE/edit#gid=0

enter image description here


enter image description here


enter image description here

Auer answered 17/10, 2022 at 15:25 Comment(0)
F
1

Here is a bit simpler of a script than Thiago Mata's. I modified the script from https://webapps.stackexchange.com/a/46895 to support either single cell or range input

function MSUBSTITUTE(input, subTable)
{
  var searchArray = [], subArray = [], outputArray = [];
  for (var i = 0, length = subTable.length; i < length; i++)
  {
    if (subTable[i][0])
    {
      searchArray.push(subTable[i][0]);
      subArray.push(subTable[i][1]);
    }
  }
  var re = new RegExp(searchArray.join('|'), 'g');
  
  /* Check if we got just a single string */
  if (typeof( input ) == "string")
  {
    outputArray.push(input.replace(re, function (match) {return subArray[searchArray.indexOf(match)];}));
  } 
  else /* we got an array of strings */
  {
    for (var i = 0; i < input.length; i++)
    {
      /* force each array element in the input be a string */
      var text = input[i].toString();
      outputArray.push(text.replace(re, function (match) {return subArray[searchArray.indexOf(match)];}))
    }
  }
  return outputArray;
}
Fluxion answered 27/1, 2021 at 19:12 Comment(3)
Thanks for the solution! The script looks simple and clear. There's a side effect: the script use regex to replace means some pairs of text cannot be replaced: try to replace [hello] to hi and it won't work.Auer
I keep getting undefined as the replaced phrases. It appears to be matching correctly, but the replacement is failing. I used word boundary for the search termDemythologize
It only occurs when I add word boundary. Otherwise, it incorrectly replaces words I don't want to be replacedDemythologize
D
0

Got it

Lorem ipsum dolor sit xamet Lorem ipsum

= textjoin("";true;ARRAYFORMULA(ifna(vlookup(REGEXEXTRACT(A1;"("&REGEXREPLACE(A1;"("&(textJOIN("|";true;lookuprange))&")";")($1)(")&")");lookuprange;2;false);REGEXEXTRACT(A1;"("&REGEXREPLACE(A1;"("&(textJOIN("|";true;lookuprange))&")";")($1)(")&")"))))

Xlorem ipsum dolor sit Xamet Xlorem ipsum

Delftware answered 14/12, 2021 at 12:11 Comment(0)
P
0

I've found a simple way to do this with "ARRAYFORMULA"

You must have one list with the text to find and in a contiguos column, the list you want to replace de data, for example:

# D E
1 ToFind ToReplace
2 Avoc4do Avocado
3 Tomat3 Tomate
4 On1on Onion
5 Sug4r Sugar

then use this formula

=ARRAYFORMULA(FIND(A1:A1000,D1:D5,E1:E5))

A1:A1000 is the original column where you have multiple rows with the word "Avoc4do, Tomat3, On1on, Sugar", ArrayFormula works with a matrix where others formulas can't (formula FIND can't work finding in a matrix, so we use ArrayFormula) Then you will have a colum with the 1000 rows but now with the "ToReplace" text in order, so now cut and copy in the column A, that's it.

Photomultiplier answered 16/3, 2022 at 2:58 Comment(0)
G
0

Max Makhrov's lambda solution is definitely the best non-script one here!

To build on it, you can do iterative substitutions for any 2D array with the following formula: =ArrayFormula( IF(ISTEXT(data), REDUCE(data,SEQUENCE(COUNTA(re)), LAMBDA(ini,v, SUBSTITUTE(ini,INDEX(re,v),INDEX(with,v)) )), data) )

Swap in the ranges as follows: date= range of cells you want to check and swap text re= range of texts or characters you want to replace with= range of texts or characters to what to replace with. Must be same size 1:1 as re

This function uses the REDUCE lambda function to iterate through each text in re for each cell in the data. It has an IF check to skip any cells that are not text. This avoids the potential issue of numbers being converted to text.

This can be easily made into a custom function for regular use.

Hope this helps!

Gust answered 26/1 at 15:0 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.