PHP : writing a simple removeEmoji function

Asked 9/10, 2012 at 19:35 Answered 1/12, 2021 at 12:29

I'm looking for a simple function that would remove Emoji characters from instagram comments. What I've tried for now (with a lot of code from examples I found on SO & other websites) :

// PHP class
public static function removeEmoji($string)
{
    // split the string into UTF8 char array
    // for loop inside char array
        // if char is emoji, remove it
    // endfor
    // return newstring
}

Any help would be appreciated

Beatnik answered 9/10, 2012 at 19:35 Comment(5)

please tell us a little more in depth what you have tried, because this doesn't really say much. Why is your code not working? What is your output and how does it compare to the expected output? – Thirdrate 9/10, 2012 at 19:38

en.wikipedia.org/wiki/Emoji read this first please – Wariness 9/10, 2012 at 19:43

@JonTaylor ive tried different solutions found on SO. Actually none seems to work well. – Beatnik 9/10, 2012 at 20:43

If you are still using this function and are finding that some of the emojis in IOS 7 aren't being removed take a look at my answer that I just posted which expands on yours. – Kudos 26/11, 2013 at 3:46

What have you tried so far? Where are you stuck? How is your problem directly related to Instagram? – Laborer 27/5 at 17:57

Community note: removing emojis isn't necessary in 2020 and beyond. Every DBMS or online service already supports them. Regarding MySQL, just make sure utf8mb4 encoding is used for both tables and connection. See more.

I think the preg_replace function is the simpliest solution.

As EaterOfCode suggests, I read the wiki page and coded new regex since none of SO (or other websites) answers seemed to work for Instagram photo captions (API returning format) . Note: /u identifier is mandatory to match \x unicode chars.

public static function removeEmoji($text) {

    $clean_text = "";
    
    // Match Emoticons
    $regexEmoticons = '/[\x{1F600}-\x{1F64F}]/u';
    $clean_text = preg_replace($regexEmoticons, '', $text);
    
    // Match Miscellaneous Symbols and Pictographs
    $regexSymbols = '/[\x{1F300}-\x{1F5FF}]/u';
    $clean_text = preg_replace($regexSymbols, '', $clean_text);

    // Match Transport And Map Symbols
    $regexTransport = '/[\x{1F680}-\x{1F6FF}]/u';
    $clean_text = preg_replace($regexTransport, '', $clean_text);

    // Match Miscellaneous Symbols
    $regexMisc = '/[\x{2600}-\x{26FF}]/u';
    $clean_text = preg_replace($regexMisc, '', $clean_text);

    // Match Dingbats
    $regexDingbats = '/[\x{2700}-\x{27BF}]/u';
    $clean_text = preg_replace($regexDingbats, '', $clean_text);

    return $clean_text;
}

The function does not remove all emojis since there are many more, but you get the point.

Please refer to unicode.org - full emoji list (thanks Epoc)

Beatnik answered 10/10, 2012 at 16:29 Comment(9)

Good work that you fixed it yourself! I'm proud of you ;) (I know Im a little bit late) – Wariness 31/5, 2013 at 14:50

How about removing emoji in utf-8 string? Most emoji in utf-8 starts with \xf0\x9f. – Mctyre 15/12, 2014 at 11:37

Quicker doing it this way

$clean_text = preg_replace('/[\x{1F600}-\x{1F64F}\x{1F300}-\x{1F5FF}\x{1F680}-\x{1F6FF}\x{2600}-\x{26FF}\x{2700}-\x{27BF}]+/u', '', $text);

– Betseybetsy 12/3, 2016 at 18:46

I've posted a verbose function for the sake of clarity :) – Beatnik 23/3, 2016 at 15:3

FYI full emoji list is available here (official list from the Unicode consortium) – Frangos 27/10, 2016 at 14:24

I'm getting an extra space character in the output where there shouldn't be one I went grocery 🍲shopping 😎 today 🐗 ` returns I went grocery shopping today . There's two spaces between "shopping" and "today", for example. The edited output may not show it but it's there. Any way to fix that (without calling another preg_replace and fixing \s{2,})? – Signboard 28/3, 2017 at 2:7

If the function is returning nothing, it is likely the case that you supplied a bad UTF8 input string and preg_replace choked on it. Make sure to check that preg_last_error() == PREG_NO_ERROR otherwise you will need to go back and figure out at what point did utf8 string become corrupted. – Karlis 28/3, 2017 at 15:31

Your function does not remove this emoji: 🇬🇧️🤗🤣️🥊 You can improve your regexp by taking ready intervals from the Emoji 11 specification. unicode.org/Public/emoji/11.0/emoji-data.txt – Paapanen 3/8, 2018 at 13:42

that char " 🤗 " for example won't be deleted using that replacements – Desireah 12/10, 2018 at 11:41

As apple continues to add emojis to new versions of ios, i will be updating and maintaining this answer.

This answer has been updated for ios 12.1. If you have problems, then please check the edit history for previous versions of this answer (having multiple regex in this answer exceeds SO's max post body length)

Beta Version for ios 12.1 (Nov, 2018)

public static function removeEmoji($string) {
    return preg_replace('/[\x{1F3F4}](?:\x{E0067}\x{E0062}\x{E0077}\x{E006C}\x{E0073}\x{E007F})|[\x{1F3F4}](?:\x{E0067}\x{E0062}\x{E0073}\x{E0063}\x{E0074}\x{E007F})|[\x{1F3F4}](?:\x{E0067}\x{E0062}\x{E0065}\x{E006E}\x{E0067}\x{E007F})|[\x{1F3F4}](?:\x{200D}\x{2620}\x{FE0F})|[\x{1F3F3}](?:\x{FE0F}\x{200D}\x{1F308})|[\x{0023}\x{002A}\x{0030}\x{0031}\x{0032}\x{0033}\x{0034}\x{0035}\x{0036}\x{0037}\x{0038}\x{0039}](?:\x{FE0F}\x{20E3})|[\x{1F415}](?:\x{200D}\x{1F9BA})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F467}\x{200D}\x{1F467})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F467}\x{200D}\x{1F466})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F467})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F466}\x{200D}\x{1F466})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F466})|[\x{1F468}](?:\x{200D}\x{1F468}\x{200D}\x{1F467}\x{200D}\x{1F467})|[\x{1F468}](?:\x{200D}\x{1F468}\x{200D}\x{1F466}\x{200D}\x{1F466})|[\x{1F468}](?:\x{200D}\x{1F468}\x{200D}\x{1F467}\x{200D}\x{1F466})|[\x{1F468}](?:\x{200D}\x{1F468}\x{200D}\x{1F467})|[\x{1F468}](?:\x{200D}\x{1F468}\x{200D}\x{1F466})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F469}\x{200D}\x{1F467}\x{200D}\x{1F467})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F469}\x{200D}\x{1F466}\x{200D}\x{1F466})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F469}\x{200D}\x{1F467}\x{200D}\x{1F466})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F469}\x{200D}\x{1F467})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F469}\x{200D}\x{1F466})|[\x{1F469}](?:\x{200D}\x{2764}\x{FE0F}\x{200D}\x{1F469})|[\x{1F469}\x{1F468}](?:\x{200D}\x{2764}\x{FE0F}\x{200D}\x{1F468})|[\x{1F469}](?:\x{200D}\x{2764}\x{FE0F}\x{200D}\x{1F48B}\x{200D}\x{1F469})|[\x{1F469}\x{1F468}](?:\x{200D}\x{2764}\x{FE0F}\x{200D}\x{1F48B}\x{200D}\x{1F468})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F9BD})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F9BC})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F9AF})|[\x{1F575}\x{1F3CC}\x{26F9}\x{1F3CB}](?:\x{FE0F}\x{200D}\x{2640}\x{FE0F})|[\x{1F575}\x{1F3CC}\x{26F9}\x{1F3CB}](?:\x{FE0F}\x{200D}\x{2642}\x{FE0F})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F692})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F680})|[\x{1F468}\x{1F469}](?:\x{200D}\x{2708}\x{FE0F})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F3A8})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F3A4})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F4BB})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F52C})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F4BC})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F3ED})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F527})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F373})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F33E})|[\x{1F468}\x{1F469}](?:\x{200D}\x{2696}\x{FE0F})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F3EB})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F393})|[\x{1F468}\x{1F469}](?:\x{200D}\x{2695}\x{FE0F})|[\x{1F471}\x{1F64D}\x{1F64E}\x{1F645}\x{1F646}\x{1F481}\x{1F64B}\x{1F9CF}\x{1F647}\x{1F926}\x{1F937}\x{1F46E}\x{1F482}\x{1F477}\x{1F473}\x{1F9B8}\x{1F9B9}\x{1F9D9}\x{1F9DA}\x{1F9DB}\x{1F9DC}\x{1F9DD}\x{1F9DE}\x{1F9DF}\x{1F486}\x{1F487}\x{1F6B6}\x{1F9CD}\x{1F9CE}\x{1F3C3}\x{1F46F}\x{1F9D6}\x{1F9D7}\x{1F3C4}\x{1F6A3}\x{1F3CA}\x{1F6B4}\x{1F6B5}\x{1F938}\x{1F93C}\x{1F93D}\x{1F93E}\x{1F939}\x{1F9D8}](?:\x{200D}\x{2640}\x{FE0F})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F9B2})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F9B3})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F9B1})|[\x{1F468}\x{1F469}](?:\x{200D}\x{1F9B0})|[\x{1F471}\x{1F64D}\x{1F64E}\x{1F645}\x{1F646}\x{1F481}\x{1F64B}\x{1F9CF}\x{1F647}\x{1F926}\x{1F937}\x{1F46E}\x{1F482}\x{1F477}\x{1F473}\x{1F9B8}\x{1F9B9}\x{1F9D9}\x{1F9DA}\x{1F9DB}\x{1F9DC}\x{1F9DD}\x{1F9DE}\x{1F9DF}\x{1F486}\x{1F487}\x{1F6B6}\x{1F9CD}\x{1F9CE}\x{1F3C3}\x{1F46F}\x{1F9D6}\x{1F9D7}\x{1F3C4}\x{1F6A3}\x{1F3CA}\x{1F6B4}\x{1F6B5}\x{1F938}\x{1F93C}\x{1F93D}\x{1F93E}\x{1F939}\x{1F9D8}](?:\x{200D}\x{2642}\x{FE0F})|[\x{1F441}](?:\x{FE0F}\x{200D}\x{1F5E8}\x{FE0F})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1E9}\x{1F1F0}\x{1F1F2}\x{1F1F3}\x{1F1F8}\x{1F1F9}\x{1F1FA}](?:\x{1F1FF})|[\x{1F1E7}\x{1F1E8}\x{1F1EC}\x{1F1F0}\x{1F1F1}\x{1F1F2}\x{1F1F5}\x{1F1F8}\x{1F1FA}](?:\x{1F1FE})|[\x{1F1E6}\x{1F1E8}\x{1F1F2}\x{1F1F8}](?:\x{1F1FD})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1EC}\x{1F1F0}\x{1F1F2}\x{1F1F5}\x{1F1F7}\x{1F1F9}\x{1F1FF}](?:\x{1F1FC})|[\x{1F1E7}\x{1F1E8}\x{1F1F1}\x{1F1F2}\x{1F1F8}\x{1F1F9}](?:\x{1F1FB})|[\x{1F1E6}\x{1F1E8}\x{1F1EA}\x{1F1EC}\x{1F1ED}\x{1F1F1}\x{1F1F2}\x{1F1F3}\x{1F1F7}\x{1F1FB}](?:\x{1F1FA})|[\x{1F1E6}\x{1F1E7}\x{1F1EA}\x{1F1EC}\x{1F1ED}\x{1F1EE}\x{1F1F1}\x{1F1F2}\x{1F1F5}\x{1F1F8}\x{1F1F9}\x{1F1FE}](?:\x{1F1F9})|[\x{1F1E6}\x{1F1E7}\x{1F1EA}\x{1F1EC}\x{1F1EE}\x{1F1F1}\x{1F1F2}\x{1F1F5}\x{1F1F7}\x{1F1F8}\x{1F1FA}\x{1F1FC}](?:\x{1F1F8})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1EA}\x{1F1EB}\x{1F1EC}\x{1F1ED}\x{1F1EE}\x{1F1F0}\x{1F1F1}\x{1F1F2}\x{1F1F3}\x{1F1F5}\x{1F1F8}\x{1F1F9}](?:\x{1F1F7})|[\x{1F1E6}\x{1F1E7}\x{1F1EC}\x{1F1EE}\x{1F1F2}](?:\x{1F1F6})|[\x{1F1E8}\x{1F1EC}\x{1F1EF}\x{1F1F0}\x{1F1F2}\x{1F1F3}](?:\x{1F1F5})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1E9}\x{1F1EB}\x{1F1EE}\x{1F1EF}\x{1F1F2}\x{1F1F3}\x{1F1F7}\x{1F1F8}\x{1F1F9}](?:\x{1F1F4})|[\x{1F1E7}\x{1F1E8}\x{1F1EC}\x{1F1ED}\x{1F1EE}\x{1F1F0}\x{1F1F2}\x{1F1F5}\x{1F1F8}\x{1F1F9}\x{1F1FA}\x{1F1FB}](?:\x{1F1F3})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1E9}\x{1F1EB}\x{1F1EC}\x{1F1ED}\x{1F1EE}\x{1F1EF}\x{1F1F0}\x{1F1F2}\x{1F1F4}\x{1F1F5}\x{1F1F8}\x{1F1F9}\x{1F1FA}\x{1F1FF}](?:\x{1F1F2})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1EC}\x{1F1EE}\x{1F1F2}\x{1F1F3}\x{1F1F5}\x{1F1F8}\x{1F1F9}](?:\x{1F1F1})|[\x{1F1E8}\x{1F1E9}\x{1F1EB}\x{1F1ED}\x{1F1F1}\x{1F1F2}\x{1F1F5}\x{1F1F8}\x{1F1F9}\x{1F1FD}](?:\x{1F1F0})|[\x{1F1E7}\x{1F1E9}\x{1F1EB}\x{1F1F8}\x{1F1F9}](?:\x{1F1EF})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1EB}\x{1F1EC}\x{1F1F0}\x{1F1F1}\x{1F1F3}\x{1F1F8}\x{1F1FB}](?:\x{1F1EE})|[\x{1F1E7}\x{1F1E8}\x{1F1EA}\x{1F1EC}\x{1F1F0}\x{1F1F2}\x{1F1F5}\x{1F1F8}\x{1F1F9}](?:\x{1F1ED})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1E9}\x{1F1EA}\x{1F1EC}\x{1F1F0}\x{1F1F2}\x{1F1F3}\x{1F1F5}\x{1F1F8}\x{1F1F9}\x{1F1FA}\x{1F1FB}](?:\x{1F1EC})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1EC}\x{1F1F2}\x{1F1F3}\x{1F1F5}\x{1F1F9}\x{1F1FC}](?:\x{1F1EB})|[\x{1F1E6}\x{1F1E7}\x{1F1E9}\x{1F1EA}\x{1F1EC}\x{1F1EE}\x{1F1EF}\x{1F1F0}\x{1F1F2}\x{1F1F3}\x{1F1F5}\x{1F1F7}\x{1F1F8}\x{1F1FB}\x{1F1FE}](?:\x{1F1EA})|[\x{1F1E6}\x{1F1E7}\x{1F1E8}\x{1F1EC}\x{1F1EE}\x{1F1F2}\x{1F1F8}\x{1F1F9}](?:\x{1F1E9})|[\x{1F1E6}\x{1F1E8}\x{1F1EA}\x{1F1EE}\x{1F1F1}\x{1F1F2}\x{1F1F3}\x{1F1F8}\x{1F1F9}\x{1F1FB}](?:\x{1F1E8})|[\x{1F1E7}\x{1F1EC}\x{1F1F1}\x{1F1F8}](?:\x{1F1E7})|[\x{1F1E7}\x{1F1E8}\x{1F1EA}\x{1F1EC}\x{1F1F1}\x{1F1F2}\x{1F1F3}\x{1F1F5}\x{1F1F6}\x{1F1F8}\x{1F1F9}\x{1F1FA}\x{1F1FB}\x{1F1FF}](?:\x{1F1E6})|[\x{00A9}\x{00AE}\x{203C}\x{2049}\x{2122}\x{2139}\x{2194}-\x{2199}\x{21A9}-\x{21AA}\x{231A}-\x{231B}\x{2328}\x{23CF}\x{23E9}-\x{23F3}\x{23F8}-\x{23FA}\x{24C2}\x{25AA}-\x{25AB}\x{25B6}\x{25C0}\x{25FB}-\x{25FE}\x{2600}-\x{2604}\x{260E}\x{2611}\x{2614}-\x{2615}\x{2618}\x{261D}\x{2620}\x{2622}-\x{2623}\x{2626}\x{262A}\x{262E}-\x{262F}\x{2638}-\x{263A}\x{2640}\x{2642}\x{2648}-\x{2653}\x{265F}-\x{2660}\x{2663}\x{2665}-\x{2666}\x{2668}\x{267B}\x{267E}-\x{267F}\x{2692}-\x{2697}\x{2699}\x{269B}-\x{269C}\x{26A0}-\x{26A1}\x{26AA}-\x{26AB}\x{26B0}-\x{26B1}\x{26BD}-\x{26BE}\x{26C4}-\x{26C5}\x{26C8}\x{26CE}-\x{26CF}\x{26D1}\x{26D3}-\x{26D4}\x{26E9}-\x{26EA}\x{26F0}-\x{26F5}\x{26F7}-\x{26FA}\x{26FD}\x{2702}\x{2705}\x{2708}-\x{270D}\x{270F}\x{2712}\x{2714}\x{2716}\x{271D}\x{2721}\x{2728}\x{2733}-\x{2734}\x{2744}\x{2747}\x{274C}\x{274E}\x{2753}-\x{2755}\x{2757}\x{2763}-\x{2764}\x{2795}-\x{2797}\x{27A1}\x{27B0}\x{27BF}\x{2934}-\x{2935}\x{2B05}-\x{2B07}\x{2B1B}-\x{2B1C}\x{2B50}\x{2B55}\x{3030}\x{303D}\x{3297}\x{3299}\x{1F004}\x{1F0CF}\x{1F170}-\x{1F171}\x{1F17E}-\x{1F17F}\x{1F18E}\x{1F191}-\x{1F19A}\x{1F201}-\x{1F202}\x{1F21A}\x{1F22F}\x{1F232}-\x{1F23A}\x{1F250}-\x{1F251}\x{1F300}-\x{1F321}\x{1F324}-\x{1F393}\x{1F396}-\x{1F397}\x{1F399}-\x{1F39B}\x{1F39E}-\x{1F3F0}\x{1F3F3}-\x{1F3F5}\x{1F3F7}-\x{1F3FA}\x{1F400}-\x{1F4FD}\x{1F4FF}-\x{1F53D}\x{1F549}-\x{1F54E}\x{1F550}-\x{1F567}\x{1F56F}-\x{1F570}\x{1F573}-\x{1F57A}\x{1F587}\x{1F58A}-\x{1F58D}\x{1F590}\x{1F595}-\x{1F596}\x{1F5A4}-\x{1F5A5}\x{1F5A8}\x{1F5B1}-\x{1F5B2}\x{1F5BC}\x{1F5C2}-\x{1F5C4}\x{1F5D1}-\x{1F5D3}\x{1F5DC}-\x{1F5DE}\x{1F5E1}\x{1F5E3}\x{1F5E8}\x{1F5EF}\x{1F5F3}\x{1F5FA}-\x{1F64F}\x{1F680}-\x{1F6C5}\x{1F6CB}-\x{1F6D2}\x{1F6D5}\x{1F6E0}-\x{1F6E5}\x{1F6E9}\x{1F6EB}-\x{1F6EC}\x{1F6F0}\x{1F6F3}-\x{1F6FA}\x{1F7E0}-\x{1F7EB}\x{1F90D}-\x{1F93A}\x{1F93C}-\x{1F945}\x{1F947}-\x{1F971}\x{1F973}-\x{1F976}\x{1F97A}-\x{1F9A2}\x{1F9A5}-\x{1F9AA}\x{1F9AE}-\x{1F9CA}\x{1F9CD}-\x{1F9FF}\x{1FA70}-\x{1FA73}\x{1FA78}-\x{1FA7A}\x{1FA80}-\x{1FA82}\x{1FA90}-\x{1FA95}]/u', '', $string);
}

Kudos answered 26/11, 2013 at 3:44 Comment(11)

This was also the only one that worked for me. Thank you so much! – Nucleon 3/6, 2014 at 19:26

Same for me. The first (accepted answer) removed many of the symbols, but not all. As far as I can tell, this is doing the trick, though! Thank you! – Sheriff 31/7, 2014 at 17:37

This works really well but it also removes the pipe symbol | – Antebellum 10/5, 2016 at 2:29

@Antebellum Looks like I had a few stray | characters in there. I've updated it and it shouldn't do that anymore – Kudos 10/5, 2016 at 3:49

Thanks for the awsome function @AdamMerrifield quick question, how would I include these 2 emojis that aren't being detected? \uD83E\uDD13 and \uD83E\uDD14 – Notorious 15/6, 2016 at 22:58

@Notorious You should be able to just add |[\x{D83E}][\x{DD13}-\x{DD14}] near the end right between the ? and the /u so it should end with ...[\x{FE00}-\x{FEFF}]?|[\x{D83E}][\x{DD13}-\x{DD14}]/u but for some reason when I add that I get an error from php

Warning:  preg_replace(): Compilation failed: disallowed Unicode code point (&gt;= 0xd800 &amp;&amp; &lt;= 0xdfff) at offset 468

. I'm not exactly sure how to resolve this. – Kudos 17/6, 2016 at 14:0

iOS 10 has more crazy emoticon fun. Some emoticons are actually two. EG: iOS10 blonde man with one hand up = 🙋🏼‍♂️ – Uncommunicative 13/10, 2016 at 0:41

I'm also using this with Wordpress as it sometimes fail to serialize them when trying to save it as a transient. – Juvenal 13/1, 2019 at 13:2

@Juvenal when it fails, are you receiving any error message? does this failure return you an unsanitized string, or nothing? if it still returns a string, then does it seem to be a specific set of emojis that are not being replaced? – Kudos 14/1, 2019 at 15:53

@AdamMerrifield no. It was hard to find, but it seems that some emojis (almost all of them from what I remember, the classic smile was working I think) failed to be serialized. So on unserialize, the data is corrupted. But no error, as something is saved and returned from the transient. (WP fail to serialize when saving transient, your script works great). – Juvenal 14/1, 2019 at 15:56

thank you thank you so much, apple and its emojis grrr ! this works as of mar 2019 – Aulos 20/3, 2019 at 15:47

Updated the correct answer with more codes, just a few emojis are left.

public static function removeEmoji($text) {

    $clean_text = "";

    // Match Emoticons
    $regexEmoticons = '/[\x{1F600}-\x{1F64F}]/u';
    $clean_text = preg_replace($regexEmoticons, '', $text);

    // Match Miscellaneous Symbols and Pictographs
    $regexSymbols = '/[\x{1F300}-\x{1F5FF}]/u';
    $clean_text = preg_replace($regexSymbols, '', $clean_text);

    // Match Transport And Map Symbols
    $regexTransport = '/[\x{1F680}-\x{1F6FF}]/u';
    $clean_text = preg_replace($regexTransport, '', $clean_text);

    // Match Miscellaneous Symbols
    $regexMisc = '/[\x{2600}-\x{26FF}]/u';
    $clean_text = preg_replace($regexMisc, '', $clean_text);

    // Match Dingbats
    $regexDingbats = '/[\x{2700}-\x{27BF}]/u';
    $clean_text = preg_replace($regexDingbats, '', $clean_text);

    // Match Flags
    $regexDingbats = '/[\x{1F1E6}-\x{1F1FF}]/u';
    $clean_text = preg_replace($regexDingbats, '', $clean_text);

    // Others
    $regexDingbats = '/[\x{1F910}-\x{1F95E}]/u';
    $clean_text = preg_replace($regexDingbats, '', $clean_text);

    $regexDingbats = '/[\x{1F980}-\x{1F991}]/u';
    $clean_text = preg_replace($regexDingbats, '', $clean_text);

    $regexDingbats = '/[\x{1F9C0}]/u';
    $clean_text = preg_replace($regexDingbats, '', $clean_text);

    $regexDingbats = '/[\x{1F9F9}]/u';
    $clean_text = preg_replace($regexDingbats, '', $clean_text);

    return $clean_text;
}

Laveta answered 24/1, 2017 at 15:22 Comment(5)

Couldn't you just update the correct answer instead of posting a pseudo-new one? – Chantalchantalle 24/1, 2017 at 15:45

There is no harm in getting a downvote on an answer. Anyway, I think it is fair given that it is kind of a repost of an answer that is not yours. If you want to contribute (with an update or whatever) and find yourself with doubts how to, search first (or ask) at Meta Stack Overflow. Hope this helps. – Chantalchantalle 24/1, 2017 at 17:26

yes it works but it also remove the line break, how I can fix it? – Opalopalesce 11/9, 2020 at 15:13

doesn't work for example with 🧺 – Krouse 11/2, 2022 at 11:32

of course it doesn't, this answer is from 2017, many new emojis have been added since then. – Laveta 18/3, 2022 at 15:41

It is also possible to remove the emojis using iconv. It's pretty similar to the solution based on mb_convert_encoding in this thread, but iconv offers the //IGNORE option, so there's no need to protect/restore the "?". The emojis are replaced with a space, so the function is replacing multiple consecutive spaces with a single one.

It only works well with texts that are Latin-9 + emoji

But:

It's about 100x faster than the best answer (as of dec. 2020),
For Latin texts, it's more reliable (the best answer leaves unwanted characters with some "Dark Skin Tone" emojis, for instance 🙅🏿 🙅🏿‍♂️ 🙆🏿 🙆🏿‍♂️ 🙋🏿 🙋🏿‍♂️ 🤦🏿‍♀️ 🤦🏿‍♂️ 🤷🏿‍♀️ 🤷🏿‍♂️ 🙎🏿 🙎🏿‍♂️ 🙍🏿 🙍🏿‍♂️ 💇🏿 💇🏿‍♂️, or even 🤎),
Future emojis will also be removed.

function removeEmoji(string $text): string
{
    $text = iconv('UTF-8', 'ISO-8859-15//IGNORE', $text);
    $text = preg_replace('/\s+/', ' ', $text);
    return iconv('ISO-8859-15', 'UTF-8', $text);
}

Barnabe answered 7/12, 2020 at 10:3 Comment(5)

Thank you, this fix is working on every emojis! – Plutonium 29/12, 2021 at 18:7

Does not work: removeEmoji('Lorem 🤷 ipsum ❤ dolor 🥺 med') returns Lorem ipsum dolor med instead of Lorem ipsum dolor med. – Blackface 23/7, 2022 at 23:51

@DavidVielhuber This is the expected behavior. If for some reason you want to keep the extra spaces, just comment out this line: $text = preg_replace('/\s+/', ' ', $text); – Barnabe 8/8, 2022 at 8:32

Note that this also removes a long dash "—" or characters like an arrow "→". Perfect in my case, but good to know before you blindly copy this code. Characters being preserved are listed here: en.wikipedia.org/wiki/ISO/IEC_8859-15 – Philipp 3/3, 2023 at 19:45

@Philipp Thank you for sharing it. What would be the best ISO to use? – Sclerosis 10/1 at 9:42

I developed a funtcion using the parser from UTF-8 for ISO-8859-1 in php ( who returns a ? character for invalid characters in conversion ).

function removeEmojis( $string ) {
    $string = str_replace( "?", "{%}", $string );
    $string  = mb_convert_encoding( $string, "ISO-8859-1", "UTF-8" );
    $string  = mb_convert_encoding( $string, "UTF-8", "ISO-8859-1" );
    $string  = str_replace( array( "?", "? ", " ?" ), array(""), $string );
    $string  = str_replace( "{%}", "?", $string );
    return trim( $string );
}

Explanation:

convert the string from utf-8 to iso-8859-1
return back to utf-8 (mb_ function replace invalid characters to ''?''remove non-valid characters )
Replace ? to none
Return back the ''?'' character from the original string

Make sure you are using UTF-8 to work.

Dubitable answered 6/1, 2016 at 18:43 Comment(5)

Remember to look up the ISO-8859-1 character set before using this. ISO-8859-1 is a pretty limited character set. This method is a lot quicker than the accepted answer but it also strips out a whole lot more than emoji's. – Brann 22/9, 2016 at 7:19

What if you have ? chars in your $string – Uncommunicative 13/10, 2016 at 0:47

@Uncommunicative You can escape them, or save them somewhere else before doing this transformation – Copeland 15/12, 2016 at 8:30

Hey this works really well for slugs. Input that caused a memory heap error via the iconv method totally worked with this, thanks! – Cranston 10/12, 2019 at 0:53

Genius! :) That is the first true solution! – Flashing 15/2 at 12:19

While all of these approaches are valid, they are fundamentally a blocklist of characters over regex: this is hardly maintainable, and prone to error.

Emojis are actually one of various different code blocks that see large use as icons on the web and elsewhere: Miscellaneous Symbols and Pictographs, Emoticons, Transport and Map Symbols are only the most used, but I could go on with symbols like Mahjong tiles and alchemical ones, all belonging to the Supplementary Multilingual Plane.

Unicode has a definite structure for allocating code points (that is, symbol encodings) that won't presumably change across versions, and you may very well leverage that:

Between 1F000 and 1F0FF you are -only- going to find game symbols
Between 1F300 and 1FBFF you are -never- going to find an alphabetic or language writing symbol, enclosed or otherwise
Between E0000 and E007D you are going to find the mysterious Tags code block: when encapsulated by 1F3F4 (Which is this: 🏴) and E007F they allow rendering flags, acting as modifying characters. if you filter out the black flag, filter this ones out too!

So, instead on relying on hacky preg_replaces implementations which are not safe for multibyte strings (and that is the reason we have mb_ereg_replace), use the Intl module:

/**
  * Removes all characters within a Unicode codepoint range, *extremes included*, from a given UTF-8 string
  * @param string $text The text to filter
  * @param int $rangeStart The beginning of the Unicode range
  * @param int $rangeEnd The end of the Unicode range
  * @return string The filtered string
 */
function SanifyUnicodeRange(string $input, int $rangeStart, int $rangeEnd) {
     /*
     If you have php >= 7.4, use mb_str_split in place of the following 7 lines 
     If you are using another UTF encoding and you're not using mb_str_split, 
     remember to change it below
     */
     $inputLength = mb_strlen($input);
     $charactersArray = array();
     while ($inputLength) { 
         $charactersArray[] = mb_substr($input, 0, 1, "UTF-8");
         $input = mb_substr($input, 1, $inputLength, "UTF-8"); 
         $inputLength = mb_strlen($input);
     }
     //Iterate over the characters array, and implode (which is mb-safe) it back into a string
     return implode('', array_filter($charactersArray, function ($unicodeCharacter) use ($rangeStart, $rangeEnd) {
         $codePoint = IntlChar::ord($unicodeCharacter);
         //Does it fall within the code block we're filtering?
         return ($codePoint < $rangeStart || $codePoint > $rangeEnd);
     }));
 }

Selfassurance answered 3/7, 2020 at 7:32 Comment(0)

use below pattern to remove all of emojis

function removeEmoji($text) {
    return preg_replace('/([0-9|#][\x{20E3}])|[\x{00ae}|\x{00a9}|\x{203C}|\x{2047}|\x{2048}|\x{2049}|\x{3030}|\x{303D}|\x{2139}|\x{2122}|\x{3297}|\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{1F000}-\x{1FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F9FF}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F9FF}][\x{1F000}-\x{1FEFF}]?/u', '', $text);
}

reference

Musclebound answered 9/9, 2020 at 13:36 Comment(0)

We had a really long fight with emojis at my work, we found a few regex for this problem but none of them worked. This one is working:

Edit: This does not cover ALL the emojis. I'm still searching for the Holy Grail of Emoji Regexp, but not found it yet.

return preg_replace('/([0-9|#][\x{20E3}])|[\x{00ae}\x{00a9}\x{203C}\x{2047}\x{2048}\x{2049}\x{3030}\x{303D}\x{2139}\x{2122}\x{3297}\x{3299}][\x{FE00}-\x{FEFF}]?|[\x{2190}-\x{21FF}][\x{FE00}-\x{FEFF}]?|[\x{2300}-\x{23FF}][\x{FE00}-\x{FEFF}]?|[\x{2460}-\x{24FF}][\x{FE00}-\x{FEFF}]?|[\x{25A0}-\x{25FF}][\x{FE00}-\x{FEFF}]?|[\x{2600}-\x{27BF}][\x{FE00}-\x{FEFF}]?|[\x{2900}-\x{297F}][\x{FE00}-\x{FEFF}]?|[\x{2B00}-\x{2BF0}][\x{FE00}-\x{FEFF}]?|[\x{1F000}-\x{1F6FF}][\x{FE00}-\x{FEFF}]?/u', '', $text);

Rishi answered 31/8, 2016 at 10:39 Comment(2)

This is for all existing emoji? – Dreddy 1/11, 2016 at 14:34

Well it's for every which I could find. It doesn't cover all of them. I think maybe 80% of them. – Rishi 1/11, 2016 at 14:46

PHP remove Emojis or 4 byte characters

Emojis or BMP character have more than three bytes and maximum of four bytes per character. To store this type of characters, UTF8mb4 character set is needed in MySQL. And it is available only in MySQL 5.5.3 and above versions.

Otherwise, remove all 4 byte characters and store it in DB. Example script follows:

#to remove 4byte characters like emojis etc..
function replace_4byte($string) {
    return preg_replace('%(?:
          \xF0[\x90-\xBF][\x80-\xBF]{2}      # planes 1-3
        | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
        | \xF4[\x80-\x8F][\x80-\xBF]{2}      # plane 16
    )%xs', '', $string);    
}

Test with:

$string = "We test those emojis 🙂 👍 🙏🏼 😔 🚀";
$string = replace_4byte($string);
echo $string;

Output:

We test those emojis

Credits go to http://scriptsof.com/php-remove-emojis-or-4-byte-characters-19

Leighannleighland answered 1/12, 2021 at 12:29 Comment(0)

It's a simple regex but supports it all!

$re = '/[
(\x{1F600}-\x{1F64F})|
(\x{2700}-\x{27BF})|
(\x{1F680}-\x{1F6FF})|
(\x{24C2}-\x{1F251})|
(\x{1F30D}-\x{1F567})|
(\x{1F900}-\x{1F9FF})|
(\x{1F300}-\x{1F5FF})
]/mu';

Check out the result in here (regex101).

So your php function can be:

function removeEmojis($input) {
    $re = '/[
(\x{1F600}-\x{1F64F})|
(\x{2700}-\x{27BF})|
(\x{1F680}-\x{1F6FF})|
(\x{24C2}-\x{1F251})|
(\x{1F30D}-\x{1F567})|
(\x{1F900}-\x{1F9FF})|
(\x{1F300}-\x{1F5FF})
]/mu';
    $result = preg_replace($re, "", $input);
    return $result;
}

Iconostasis answered 1/9, 2019 at 8:11 Comment(5)

Your regex also matches spaces – Inesinescapable 27/2, 2020 at 16:11

@Inesinescapable Are you sure? I checked it on regex101 (which its link is mentioned in the answer) but does not seem to match spaces. – Iconostasis 27/2, 2020 at 16:55

please double check your regex101 link: Matches: - 111 (before "People") - 112 (between "People" and "and") - 113 (between "and" and "fantasy") – Inesinescapable 28/2, 2020 at 17:14

@Iconostasis almost perfect THX ! Could you explain how it works please ? And could you have a look at this emojis list : regextester.com/106421 ==> Almost all emojis are detected by your REGEX (I don't know where I can find a list with all emojis). – Wahhabi 13/4, 2020 at 17:17

this removes \n and () – Beaufort 28/3 at 12:24

I have solved this issue by using the same code WordPress uses to replace emojis by images

here is the code that I used and it worked perfectly as it has a comprehensive list of the most used emojis

The full code exists here https://pastebin.com/8MqGdD6p

here is how it works but make sure to copy the code from pastebin as this is the non-complete code

$content ='<span class="do">⚫</span> where emojis exist';
$partials = array('&#x1f469;&#x200d;); // the list of emojis 

foreach ( $partials as $emojum ) {
    if ( version_compare( phpversion(), '5.4', '<' ) ) {
        $emoji_char = html_entity_decode( $emojum, ENT_COMPAT, 'UTF-8' );
    } else {
        $emoji_char = html_entity_decode( $emojum );
    }
    if ( false !== strpos( $content, $emoji_char ) ) {
        $content = preg_replace( "/$emoji_char/", '', $content );
    }
}

Scorekeeper answered 13/8, 2018 at 23:20 Comment(0)

You can use this regex too:

$text = preg_replace('([*#0-9](?>\\xEF\\xB8\\x8F)?\\xE2\\x83\\xA3|\\xC2[\\xA9\\xAE]|\\xE2..(\\xF0\\x9F\\x8F[\\xBB-\\xBF])?(?>\\xEF\\xB8\\x8F)?|\\xE3(?>\\x80[\\xB0\\xBD]|\\x8A[\\x97\\x99])(?>\\xEF\\xB8\\x8F)?|\\xF0\\x9F(?>[\\x80-\\x86].(?>\\xEF\\xB8\\x8F)?|\\x87.\\xF0\\x9F\\x87.|..(\\xF0\\x9F\\x8F[\\xBB-\\xBF])?|(((?<zwj>\\xE2\\x80\\x8D)\\xE2\\x9D\\xA4\\xEF\\xB8\\x8F\k<zwj>\\xF0\\x9F..(\k<zwj>\\xF0\\x9F\\x91.)?|(\\xE2\\x80\\x8D\\xF0\\x9F\\x91.){2,3}))?))',' ',$text);

I searching many times and find it, Hope it will be useful.

Threadfin answered 6/5, 2021 at 18:30 Comment(0)

-1

function emojiFilter($text){
$text = json_encode($text);
preg_match_all("/(\\\\ud83c\\\\u[0-9a-f]{4})|(\\\\ud83d\\\u[0-9a-f]{4})|(\\\\u[0-9a-f]{4})/", $text, $matchs);
if(!isset($matchs[0][0])) { return json_decode($text, true); }

$emoji = $matchs[0];
foreach($emoji as $ec) {
    $hex = substr($ec, -4);
    if(strlen($ec)==6) {
        if($hex>='2600' and $hex<='27ff') {
            $text = str_replace($ec, '', $text);
        }
    } else {
        if($hex>='dc00' and $hex<='dfff') {
            $text = str_replace($ec, '', $text);
        }
    }
}

return json_decode($text, true);  }

Farrow answered 30/10, 2014 at 8:23 Comment(1)

It would be useful if you could add some comments to this to help give some background in how it works. – Corcovado 30/10, 2014 at 8:27

-1

@sglessard since the code is outdated, here the full list of all Emoji for 07/12/2018 You will be able to generate it, by running the source code i posted

Please let me know if you find any kind of issue, thank you.

public static function removeEmoji($text) {
    $regexEmoticons = [
        '/[\x{0023}]/u',
        '/[\x{002A}]/u',
        '/[\x{00A9}]/u',
        '/[\x{00AE}]/u',
        '/[\x{200D}]/u',
        '/[\x{203C}]/u',
        '/[\x{2049}]/u',
        '/[\x{20E3}]/u',
        '/[\x{2122}]/u',
        '/[\x{2139}]/u',
        '/[\x{2194}-\x{2199}]/u',
        '/[\x{21A9}-\x{21AA}]/u',
        '/[\x{231A}-\x{231B}]/u',
        '/[\x{2328}]/u',
        '/[\x{23CF}]/u',
        '/[\x{23E9}-\x{23F3}]/u',
        '/[\x{23F8}-\x{23FA}]/u',
        '/[\x{24C2}]/u',
        '/[\x{25AA}-\x{25AB}]/u',
        '/[\x{25B6}]/u',
        '/[\x{25C0}]/u',
        '/[\x{25FB}-\x{25FE}]/u',
        '/[\x{2600}-\x{2604}]/u',
        '/[\x{260E}]/u',
        '/[\x{2611}]/u',
        '/[\x{2614}-\x{2615}]/u',
        '/[\x{2618}]/u',
        '/[\x{261D}]/u',
        '/[\x{2620}]/u',
        '/[\x{2622}-\x{2623}]/u',
        '/[\x{2626}]/u',
        '/[\x{262A}]/u',
        '/[\x{262E}-\x{262F}]/u',
        '/[\x{2638}-\x{263A}]/u',
        '/[\x{2640}]/u',
        '/[\x{2642}]/u',
        '/[\x{2648}-\x{2653}]/u',
        '/[\x{265F}-\x{2660}]/u',
        '/[\x{2663}]/u',
        '/[\x{2665}-\x{2666}]/u',
        '/[\x{2668}]/u',
        '/[\x{267B}]/u',
        '/[\x{267E}-\x{267F}]/u',
        '/[\x{2692}-\x{2697}]/u',
        '/[\x{2699}]/u',
        '/[\x{269B}-\x{269C}]/u',
        '/[\x{26A0}-\x{26A1}]/u',
        '/[\x{26AA}-\x{26AB}]/u',
        '/[\x{26B0}-\x{26B1}]/u',
        '/[\x{26BD}-\x{26BE}]/u',
        '/[\x{26C4}-\x{26C5}]/u',
        '/[\x{26C8}]/u',
        '/[\x{26CE}-\x{26CF}]/u',
        '/[\x{26D1}]/u',
        '/[\x{26D3}-\x{26D4}]/u',
        '/[\x{26E9}-\x{26EA}]/u',
        '/[\x{26F0}-\x{26F5}]/u',
        '/[\x{26F7}-\x{26FA}]/u',
        '/[\x{26FD}]/u',
        '/[\x{2702}]/u',
        '/[\x{2705}]/u',
        '/[\x{2708}-\x{270D}]/u',
        '/[\x{270F}]/u',
        '/[\x{2712}]/u',
        '/[\x{2714}]/u',
        '/[\x{2716}]/u',
        '/[\x{271D}]/u',
        '/[\x{2721}]/u',
        '/[\x{2728}]/u',
        '/[\x{2733}-\x{2734}]/u',
        '/[\x{2744}]/u',
        '/[\x{2747}]/u',
        '/[\x{274C}]/u',
        '/[\x{274E}]/u',
        '/[\x{2753}-\x{2755}]/u',
        '/[\x{2757}]/u',
        '/[\x{2763}-\x{2764}]/u',
        '/[\x{2795}-\x{2797}]/u',
        '/[\x{27A1}]/u',
        '/[\x{27B0}]/u',
        '/[\x{27BF}]/u',
        '/[\x{2934}-\x{2935}]/u',
        '/[\x{2B05}-\x{2B07}]/u',
        '/[\x{2B1B}-\x{2B1C}]/u',
        '/[\x{2B50}]/u',
        '/[\x{2B55}]/u',
        '/[\x{3030}]/u',
        '/[\x{303D}]/u',
        '/[\x{3297}]/u',
        '/[\x{3299}]/u',
        '/[\x{FE0F}]/u',
        '/[\x{1F004}]/u',
        '/[\x{1F0CF}]/u',
        '/[\x{1F170}-\x{1F171}]/u',
        '/[\x{1F17E}-\x{1F17F}]/u',
        '/[\x{1F18E}]/u',
        '/[\x{1F191}-\x{1F19A}]/u',
        '/[\x{1F1E6}-\x{1F1FF}]/u',
        '/[\x{1F201}-\x{1F202}]/u',
        '/[\x{1F21A}]/u',
        '/[\x{1F22F}]/u',
        '/[\x{1F232}-\x{1F23A}]/u',
        '/[\x{1F250}-\x{1F251}]/u',
        '/[\x{1F300}-\x{1F321}]/u',
        '/[\x{1F324}-\x{1F393}]/u',
        '/[\x{1F396}-\x{1F397}]/u',
        '/[\x{1F399}-\x{1F39B}]/u',
        '/[\x{1F39E}-\x{1F3F0}]/u',
        '/[\x{1F3F3}-\x{1F3F5}]/u',
        '/[\x{1F3F7}-\x{1F3FA}]/u',
        '/[\x{1F400}-\x{1F4FD}]/u',
        '/[\x{1F4FF}-\x{1F53D}]/u',
        '/[\x{1F549}-\x{1F54E}]/u',
        '/[\x{1F550}-\x{1F567}]/u',
        '/[\x{1F56F}-\x{1F570}]/u',
        '/[\x{1F573}-\x{1F57A}]/u',
        '/[\x{1F587}]/u',
        '/[\x{1F58A}-\x{1F58D}]/u',
        '/[\x{1F590}]/u',
        '/[\x{1F595}-\x{1F596}]/u',
        '/[\x{1F5A4}-\x{1F5A5}]/u',
        '/[\x{1F5A8}]/u',
        '/[\x{1F5B1}-\x{1F5B2}]/u',
        '/[\x{1F5BC}]/u',
        '/[\x{1F5C2}-\x{1F5C4}]/u',
        '/[\x{1F5D1}-\x{1F5D3}]/u',
        '/[\x{1F5DC}-\x{1F5DE}]/u',
        '/[\x{1F5E1}]/u',
        '/[\x{1F5E3}]/u',
        '/[\x{1F5E8}]/u',
        '/[\x{1F5EF}]/u',
        '/[\x{1F5F3}]/u',
        '/[\x{1F5FA}-\x{1F64F}]/u',
        '/[\x{1F680}-\x{1F6C5}]/u',
        '/[\x{1F6CB}-\x{1F6D2}]/u',
        '/[\x{1F6E0}-\x{1F6E5}]/u',
        '/[\x{1F6E9}]/u',
        '/[\x{1F6EB}-\x{1F6EC}]/u',
        '/[\x{1F6F0}]/u',
        '/[\x{1F6F3}-\x{1F6F9}]/u',
        '/[\x{1F910}-\x{1F93A}]/u',
        '/[\x{1F93C}-\x{1F93E}]/u',
        '/[\x{1F940}-\x{1F945}]/u',
        '/[\x{1F947}-\x{1F970}]/u',
        '/[\x{1F973}-\x{1F976}]/u',
        '/[\x{1F97A}]/u',
        '/[\x{1F97C}-\x{1F9A2}]/u',
        '/[\x{1F9B0}-\x{1F9B9}]/u',
        '/[\x{1F9C0}-\x{1F9C2}]/u',
        '/[\x{1F9D0}-\x{1F9FF}]/u',
        '/[\x{E0062}-\x{E0063}]/u',
        '/[\x{E006C}]/u',
        '/[\x{E006E}]/u',
        '/[\x{E007F}]/u'
    ];

    return preg_replace($regexEmoticons, '', $text);
}

And here the code to generate it :

<?php

$emojisAsHex = [];
$emojisasAsDecHex = [];

preg_match_all(
    "/(?:>|\s)+(U\+)(?'emojis'[0-9ABCDEF]{4,5})(?:<|\s)+/",
    file_get_contents('http://unicode.org/emoji/charts/full-emoji-list.html'),
    $emojisAsHex
);

//flip it, to remove duplication
$emojisAsHex = array_flip(array_flip($emojisAsHex['emojis']));


foreach ($emojisAsHex as $emojiAsHex) {
    $emojisasAsDecHex[hexdec($emojiAsHex)] = $emojiAsHex;
}

ksort($emojisasAsDecHex);




$outputHexa = '';
$else = '';

$startI = key($emojisasAsDecHex);
$endI =max(array_keys($emojisasAsDecHex)) + 1;

for ($i = $startI; $i < $endI; $i++) {
    if (isset($emojisasAsDecHex[$i]) && isset($emojisasAsDecHex[(1 + $i)])) {

        $outputHexa .=  "'/[\x{" . $emojisasAsDecHex[$i] . '}';
        while (isset($emojisasAsDecHex[(1 + $i)])) {

            $i++;
        }

        $outputHexa .=  '-\x{' . $emojisasAsDecHex[$i] . "}]/u'," . PHP_EOL;
    } else if (isset($emojisasAsDecHex[$i])) {
        $outputHexa .= "'/[\x{" . $emojisasAsDecHex[$i] . "}]/u'," . PHP_EOL;
    }
}


var_dump($outputHexa);

Ainsworth answered 12/7, 2018 at 19:54 Comment(1)

doesn't work properly, it detect by example 0x23 which is #. You shouldn't separate U+0023 U+FE0F U+20E3 into 2 unicodes – Krouse 11/2, 2022 at 11:12

-4

You could just use str_replace().

$emojiArray = array("&0123","&0234",etc. for all emoji);
$strippedComment = str_replace($emojiArray,"",$originalComment);

Condom answered 9/10, 2012 at 19:42 Comment(1)

Not helpful for obvious reasons – Hallerson 6/8, 2015 at 8:46

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags