Finding and Deleting All Emojis in a Google Spreadsheet
Asked Answered
H

4

4

I have a Google Spreadsheet with thousands of cells with each cell being populated with strings with many different emojis.

Example of entries:

"Lol ๐Ÿ˜Š","Haha ๐Ÿ˜Š","Fire ๐Ÿ”ฅ","๐Ÿ‘๐Ÿ‘๐Ÿ‘Awesome!","Nice๐Ÿ‘ See you tomorrow!๐Ÿ˜€", "ใ“ใ‚“ใซใกใฏ๐Ÿ˜Š", "ไฝ ๅฅฝ๐Ÿ˜€"

But I want to delete all of the emojis, is there a search function I can run/piece of Spreadsheet code I can run to make the document devoid of emojis?

Hakenkreuz answered 9/3, 2017 at 22:43 Comment(2)
Are the emoji's supposed to be there or are they a product of data conversion? โ€“ Fructose
The emoji's are supposed to be there, it's a spreadsheet filled with chat like text. โ€“ Hakenkreuz
W
3

Cleaning Up with Regular Expressions

I don't have the time to do the whole thing but this will give you a start. I cleaned everything in one cell with this.

var sht = SpreadsheetApp.getActiveSheet();
  var text = sht.getActiveCell().getValue();
  var cleantext = text.replace(/[^\s\w]/g,'');//replace everything that's not whitespace or word characters with null
  sht.getActiveCell().setValue(cleantext);

I used the line you provided as test data. Admittedly it needs a little tweaking because it's getting rid of some punctuation.

This is a little better.

function test()
{
  var sht = SpreadsheetApp.getActiveSheet();
  var text = sht.getActiveCell().getValue();
  var cleantext = text.replace(/[^\s\w"!,]/g,'');//added "!,
  sht.getActiveCell().setValue(cleantext);

}

So as you run it you may want to add a few more characters to don't replace list. That's it.

I have an expense report that I use to collect my expenses in different categories and I like to produce pie charts to help me get a big picture view of where my money is going. I use this Array Formula to help me gather the information into useful categories for me.

=ArrayFormula(IF(Row(C:C)=1,"Title",IF(LEN(C:C),IF(REGEXMATCH(C:C,"(?i)(string1|string2|string3|string4)"),D:D,""),)))

The regular expression provides an or function for adding additional matching for unexpected item appearing on my expense lists that I want to gather into these categories. If you need another matching term you just go into that formula and add another term as shown below

(string1|string2|string3|string4||string5)

The strings are replaced with real terms with no quotes unless they have quotes around them in the search target.

Woodhead answered 10/3, 2017 at 0:44 Comment(6)
So to avoid clearing Japanese punctuation marks (ใ€‚andใ€) and question I wrote: var cleantext = text.replace(/[^\s\w"!,ใ€‚ใ€.?]g,' '); and it worked perfectly! โ€“ Hakenkreuz
That's great. There's nothing really regular about regular expressions but they're hard to beat when it comes to finding stuff in text. And in this case it's probably easier to figure out what you want to keep and then delete the rest. Here's a link you might like to take a look at check it out. โ€“ Woodhead
I have a question, in for var cleantext = text.replace(/[^\s\w"!,]/g,''); I'm not certain how that gets rid of emojis. The text is being passed on the the replace function, and I'd understand if all the things within the brackets got deleted, but it's the opposite that's happening here. โ€“ Hakenkreuz
Well the idea is that you will replace everything except those things that are inside the set because the set starts with a negate operator so it's saying replace everything that's not in this set. \s is white space \w is word characters and I had ! and , and " and it looks like you added and few more things. โ€“ Woodhead
Hey Cooper, I asked another question in a different thread, but I was wondering if you can help concerning the specific code you wrote. How would I leave [ and ] (square brackets) within the text? I've tried escaping with backspace, double backspace, as well as wrapping those in quotes but to no avail. โ€“ Hakenkreuz
If you're referring to these square brackets "[^\s\w"!,]" they are just delimiters for a regular expression set. If you wanted them to not be removed you could add them like this "[^\s\w"!\[\],]" โ€“ Woodhead
I
4

Replace emojis from text

I've found, you may use a REGEXREPLACE for that.

To replace all emojis from [A1] please try:

=REGEXREPLACE($A$1,"[๐Ÿป๐Ÿผ๐Ÿฝ๐Ÿพ๐Ÿฟยฉยฎโ€ผโ‰โ„ขโ„นโ†”-โ†™โ†ฉ-โ†ชโŒš-โŒ›โŒจโโฉ-โณโธ-โบโ“‚โ–ช-โ–ซโ–ถโ—€โ—ป-โ—พโ˜€-โ˜„โ˜Žโ˜‘โ˜”-โ˜•โ˜˜โ˜โ˜ โ˜ข-โ˜ฃโ˜ฆโ˜ชโ˜ฎ-โ˜ฏโ˜ธ-โ˜บโ™€โ™‚โ™ˆ-โ™“โ™Ÿ-โ™ โ™ฃโ™ฅ-โ™ฆโ™จโ™ปโ™พ-โ™ฟโš’-โš—โš™โš›-โšœโš -โšกโšงโšช-โšซโšฐ-โšฑโšฝ-โšพโ›„-โ›…โ›ˆโ›Ž-โ›โ›‘โ›“-โ›”โ›ฉ-โ›ชโ›ฐ-โ›ตโ›ท-โ›บโ›ฝโœ‚โœ…โœˆ-โœโœโœ’โœ”โœ–โœโœกโœจโœณ-โœดโ„โ‡โŒโŽโ“-โ•โ—โฃ-โคโž•-โž—โžกโžฐโžฟโคด-โคตโฌ…-โฌ‡โฌ›-โฌœโญโญ•ใ€ฐใ€ฝใŠ—ใŠ™๐Ÿ€„๐Ÿƒ๐Ÿ…ฐ-๐Ÿ…ฑ๐Ÿ…พ-๐Ÿ…ฟ๐Ÿ†Ž๐Ÿ†‘-๐Ÿ†š๐Ÿˆ-๐Ÿˆ‚๐Ÿˆš๐Ÿˆฏ๐Ÿˆฒ-๐Ÿˆบ๐Ÿ‰-๐Ÿ‰‘๐ŸŒ€-๐ŸŒก๐ŸŒค-๐ŸŽ“๐ŸŽ–-๐ŸŽ—๐ŸŽ™-๐ŸŽ›๐ŸŽž-๐Ÿฐ๐Ÿณ-๐Ÿต๐Ÿท-๐Ÿบ๐Ÿ€-๐Ÿ“ฝ๐Ÿ“ฟ-๐Ÿ”ฝ๐Ÿ•‰-๐Ÿ•Ž๐Ÿ•-๐Ÿ•ง๐Ÿ•ฏ-๐Ÿ•ฐ๐Ÿ•ณ-๐Ÿ•บ๐Ÿ–‡๐Ÿ–Š-๐Ÿ–๐Ÿ–๐Ÿ–•-๐Ÿ––๐Ÿ–ค-๐Ÿ–ฅ๐Ÿ–จ๐Ÿ–ฑ-๐Ÿ–ฒ๐Ÿ–ผ๐Ÿ—‚-๐Ÿ—„๐Ÿ—‘-๐Ÿ—“๐Ÿ—œ-๐Ÿ—ž๐Ÿ—ก๐Ÿ—ฃ๐Ÿ—จ๐Ÿ—ฏ๐Ÿ—ณ๐Ÿ—บ-๐Ÿ™๐Ÿš€-๐Ÿ›…๐Ÿ›‹-๐Ÿ›’๐Ÿ›•-๐Ÿ›—๐Ÿ›-๐Ÿ›ฅ๐Ÿ›ฉ๐Ÿ›ซ-๐Ÿ›ฌ๐Ÿ›ฐ๐Ÿ›ณ-๐Ÿ›ผ๐ŸŸ -๐ŸŸซ๐ŸŸฐ๐ŸคŒ-๐Ÿคบ๐Ÿคผ-๐Ÿฅ…๐Ÿฅ‡-๐Ÿงฟ๐Ÿฉฐ-๐Ÿฉด๐Ÿฉธ-๐Ÿฉผ๐Ÿช€-๐Ÿช†๐Ÿช-๐Ÿชฌ๐Ÿชฐ-๐Ÿชบ๐Ÿซ€-๐Ÿซ…๐Ÿซ-๐Ÿซ™๐Ÿซ -๐Ÿซง๐Ÿซฐ-๐Ÿซถ๐Ÿ‡ฆ-๐Ÿ‡ฟ#๏ธโƒฃ*๏ธโƒฃ0๏ธโƒฃ1๏ธโƒฃ2๏ธโƒฃ3๏ธโƒฃ4๏ธโƒฃ5๏ธโƒฃ6๏ธโƒฃ7๏ธโƒฃ8๏ธโƒฃ9๏ธโƒฃ]","")

I believe this regex will find all current emojis from your text.

Notes:

  • some emojis are compound for instance, an astronaut is ๐Ÿง‘๐Ÿผโ€๐Ÿš€. Regex needs to find only solid chars, so all compound emojis will be included.

  • I've tried to shorten the solution, and used actual emojis in RegEx. You may also see more "computer-like" solutions: [\u1F60-\u1F64]|[\u2702-\u27B0].... Those solutions use codes of emojis instead.

  • Another interesting option is given here. Remove all not printable chars: =REGEXREPLACE(A1,"[[:print:]]","")

  • skins are included:

enter image description here

Ish answered 26/11, 2021 at 13:30 Comment(0)
W
3

Cleaning Up with Regular Expressions

I don't have the time to do the whole thing but this will give you a start. I cleaned everything in one cell with this.

var sht = SpreadsheetApp.getActiveSheet();
  var text = sht.getActiveCell().getValue();
  var cleantext = text.replace(/[^\s\w]/g,'');//replace everything that's not whitespace or word characters with null
  sht.getActiveCell().setValue(cleantext);

I used the line you provided as test data. Admittedly it needs a little tweaking because it's getting rid of some punctuation.

This is a little better.

function test()
{
  var sht = SpreadsheetApp.getActiveSheet();
  var text = sht.getActiveCell().getValue();
  var cleantext = text.replace(/[^\s\w"!,]/g,'');//added "!,
  sht.getActiveCell().setValue(cleantext);

}

So as you run it you may want to add a few more characters to don't replace list. That's it.

I have an expense report that I use to collect my expenses in different categories and I like to produce pie charts to help me get a big picture view of where my money is going. I use this Array Formula to help me gather the information into useful categories for me.

=ArrayFormula(IF(Row(C:C)=1,"Title",IF(LEN(C:C),IF(REGEXMATCH(C:C,"(?i)(string1|string2|string3|string4)"),D:D,""),)))

The regular expression provides an or function for adding additional matching for unexpected item appearing on my expense lists that I want to gather into these categories. If you need another matching term you just go into that formula and add another term as shown below

(string1|string2|string3|string4||string5)

The strings are replaced with real terms with no quotes unless they have quotes around them in the search target.

Woodhead answered 10/3, 2017 at 0:44 Comment(6)
So to avoid clearing Japanese punctuation marks (ใ€‚andใ€) and question I wrote: var cleantext = text.replace(/[^\s\w"!,ใ€‚ใ€.?]g,' '); and it worked perfectly! โ€“ Hakenkreuz
That's great. There's nothing really regular about regular expressions but they're hard to beat when it comes to finding stuff in text. And in this case it's probably easier to figure out what you want to keep and then delete the rest. Here's a link you might like to take a look at check it out. โ€“ Woodhead
I have a question, in for var cleantext = text.replace(/[^\s\w"!,]/g,''); I'm not certain how that gets rid of emojis. The text is being passed on the the replace function, and I'd understand if all the things within the brackets got deleted, but it's the opposite that's happening here. โ€“ Hakenkreuz
Well the idea is that you will replace everything except those things that are inside the set because the set starts with a negate operator so it's saying replace everything that's not in this set. \s is white space \w is word characters and I had ! and , and " and it looks like you added and few more things. โ€“ Woodhead
Hey Cooper, I asked another question in a different thread, but I was wondering if you can help concerning the specific code you wrote. How would I leave [ and ] (square brackets) within the text? I've tried escaping with backspace, double backspace, as well as wrapping those in quotes but to no avail. โ€“ Hakenkreuz
If you're referring to these square brackets "[^\s\w"!,]" they are just delimiters for a regular expression set. If you wanted them to not be removed you could add them like this "[^\s\w"!\[\],]" โ€“ Woodhead
W
2

Here is some code that goes through one column of data and removes emojis from each cell.

You must replace Your Sheet Tab Name with the sheet tab name that the code should work on. This code currently only processes one column of data. The entire column of values is written back to the sheet in one write operation. Any character codes that are 5 characters or more are assumed to be emojis.

Test it on a few rows of data first.

function killEmojies() {
  var arrayThisRow,columnOfValues,columnToRemoveEmojiesFrom,firstTwoChar,
      i,innerArray,j,L,newCellContent,outerArray,
      ss,sh,
      targetSheet,thisCell,thisCellChar,thisCellVal,thisCharCode,thisCharCodeLength;

  columnToRemoveEmojiesFrom = 1;
  outerArray = [];

  ss = SpreadsheetApp.getActiveSpreadsheet()
  sh = ss.getSheetByName("Your Sheet Tab Name Here");
  targetSheet = ss.getSheetByName("Your Sheet Tab Name Here");

  columnOfValues = sh.getRange(1, columnToRemoveEmojiesFrom,sh.getLastRow(),1).getValues(); 
  L = columnOfValues.length;
  Logger.log('L: ' + L);

  for (i=0;i<L;i++) {

    thisCell = columnOfValues[i];//Get inner array
    thisCellVal = thisCell[0];//Get first element of inner array

    Logger.log(thisCellVal)
    Logger.log('typeof thisCellVal: ' + typeof thisCellVal)

    newCellContent = "";//Reset for every cell
    innerArray = [];//Reset for every row loop

    if (typeof thisCellVal !== 'string') {//This spreadsheet cell contains something
      //other than text
      innerArray.push(thisCellVal);
    } else {
      for (j=0;j<thisCellVal.length;j++) {//Loop through every character in the cell
        thisCellChar = thisCellVal[j];
        thisCharCode = thisCellChar.charCodeAt(0);//Character code of this character
        thisCharCodeLength = thisCharCode.toString().length;

        Logger.log('typeof thisCharCodeLength: ' + typeof thisCharCodeLength);
        Logger.log('this val: ' + thisCharCode);
        Logger.log('thisCharCodeLength: ' + thisCharCodeLength);
        Logger.log(thisCharCodeLength < 5);

        if (thisCharCodeLength === 5) {
          firstTwoChar = thisCharCode.toString().slice(0,2);
          Logger.log('firstTwoChar: ' + firstTwoChar)

        }

        if (thisCharCodeLength > 4 && (firstTwoChar === "54" || firstTwoChar === "55" || firstTwoChar === "56")) {
          continue;//exclude character codes that are 5 or more characters long
          //and start with 54 or 55
        }
        newCellContent = newCellContent + thisCellChar;

      }

      innerArray.push(newCellContent);
    }

    outerArray.push(innerArray);
  }

  targetSheet.getRange(1, columnToRemoveEmojiesFrom,outerArray.length,1).setValues(outerArray);
}
Wreck answered 10/3, 2017 at 0:58 Comment(4)
I must apologize. Thank you, this works, but it is also deleting Japanese and Chinese characters as well. I should have mentioned that those are included in the text. I will edit the question accordingly. โ€“ Hakenkreuz
I modified the code to further detect more specific ranges of character codes. I tested it and it left in Chinese and Japanese characters. โ€“ Wreck
Thank you Sandy Good! I actually ended up using your code, to loop through the column and combined it with Cooper's code to take out the emojis. โ€“ Hakenkreuz
Will definitely do! โ€“ Hakenkreuz
V
0

Assuming all your text strings are single words followed by a space and then an Emoji, you can use the formula

=LEFT(A1,(FIND(" ",A1,1)-1))

This will return the textual contents of a cell only (A1 in this example). If all your data is in a single column, you can just pull down and this will apply to all your data.

Vervain answered 9/3, 2017 at 22:53 Comment(1)
I apologize for the lack of clarity, no the emojis are dispersed randomly through out the text. โ€“ Hakenkreuz

© 2022 - 2024 โ€” McMap. All rights reserved.