How to convert Emojis to their respective HTML code entities in PHP 5.3?
Asked Answered
R

2

12

I need to convert the Emojis (e.g. πŸ˜€) in strings to their respective HTML code entities (e.g. 😀) on a PHP 5.3 site.

I need to do this so that user input gets properly stored in a legacy script MySQL Database to later display properly when shown back to the user. When attempting to save Emojis directly from user input, they are incorrectly saved as ? in its Database. This legacy script does not support utf8mb4 in MySQL (this solution failed) and all attempts at converting its Database, Tables, and Columns to utf8mb4 have not solved this problem, so the only solution I have left which I already confirmed works is converting user-inputted Emojis in strings to their respective HTML code entities to correctly store those entities as-is in the Database so that they display correctly as Emojis when retrieved since modern browsers automatically convert those Emoji entities to Emoji characters.

I have also tried this solution, but it does not work in PHP 5.3, only in 5.4 and above. (I cannot upgrade to 5.4 on this particular site because the legacy script it depends on only works in 5.3 and cannot be changed or upgraded under any circumstances.)

I have also tried this solution, which works in PHP 5.3, but you can't feed it a string, only the specific Emoji, so it does not solve my problem despite working in PHP 5.3.

I only need the Emojis in a string converted, nothing else. (However, if that is not possible, then I suppose I can live with other HTML entities being converted with it, like & to &, but I prefer that not be the case.)

So how can I convert Emojis in strings to their respective HTML code entities in PHP 5.3 such that a string like this & that 😎 gets converted to this & that 😎?

Roomer answered 31/10, 2017 at 15:43 Comment(1)
This will not convert to htmlentities, but if your concern is just storing special characters in the DB you could utilize json_en/decode for serialization: "this & that \ud83d\ude0e" – Sum
P
9

The code to detect the emoji bypasses stackoverflow's character limit, so here's a gist instead:

https://gist.github.com/BarryMode/432a7a1f9621e824c8a3a23084a50f60#file-htmlemoji-php

The entire function is essentially just

preg_replace_callback(pattern, callback, string);

The string is the input where you have emoji that you want to change into html entities. The pattern uses regex to find the emoji in the string and then each one is fed into the callback, which is where the conversion happens from emoji to html entity.

In creating this function, htmlemoji(), I combined a few different pieces of code that others had worked on. Here's some credits:

The callback uses this stackoverflow answer to build each entity.

The pattern was directly ripped from this source on GitHub.

Pliske answered 31/10, 2017 at 16:13 Comment(11)
I had already linked to that solution in my original post and tried it and even clarified that despite working in PHP 5.3, it does not actually do what I want as it only converts a single emoji, and not just the emojis in a string while leaving the rest of the string untouched. – Roomer
@ProgrammerGirl, my apologies, I'll work on a solution that will do as you wish. – Pliske
@Roomer I revised my answer. I think this is what you were looking for. – Pliske
That is much better, but still not complete. There are still several emojis not being converted to their respective HTML codes. Execute your code in the following sandbox with all the emojis and make sure to run it in PHP 5.3 to see the ones still not being converted in the results: sandbox.onlinephpfunctions.com/code/… – Roomer
Upon further testing, these are the 80 Emojis still not being converted by your current solution: πŸ€πŸ€‘πŸ€’πŸ€“πŸ€”πŸ€•πŸ€–πŸ€—πŸ€˜πŸ€™πŸ€šπŸ€›πŸ€œπŸ€πŸ€žπŸ€ πŸ€‘πŸ€’πŸ€£πŸ€€πŸ€₯🀦🀧🀰🀳🀴🀡🀢🀷🀸🀹🀺🀼🀽🀾πŸ₯€πŸ₯πŸ₯‚πŸ₯ƒπŸ₯„πŸ₯…πŸ₯‡πŸ₯ˆπŸ₯‰πŸ₯ŠπŸ₯‹πŸ₯πŸ₯‘πŸ₯’πŸ₯“πŸ₯”πŸ₯•πŸ₯–πŸ₯—πŸ₯˜πŸ₯™πŸ₯šπŸ₯›πŸ₯œπŸ₯πŸ₯žπŸ¦€πŸ¦πŸ¦‚πŸ¦ƒπŸ¦„πŸ¦…πŸ¦†πŸ¦‡πŸ¦ˆπŸ¦‰πŸ¦ŠπŸ¦‹πŸ¦ŒπŸ¦πŸ¦ŽπŸ¦πŸ¦πŸ¦‘πŸ§€ ... Please fix this so that I can accept your solution. Thank you. – Roomer
@Roomer try the new one. – Pliske
Worked! Thank you! – Roomer
Although I don't need it, I was thinking it would be helpful to future readers who find this if you explain how your solution works and what it does. You know, the whole teach how to fish vs. giving the fish away. If you have time, please update your answer with an explanation about your solution. Thanks again! – Roomer
@ProgrammerGirl, that makes the answer look better, so good suggestion. I also included some credits for the sources of some of it as I am not responsible for the majority of the code, I just combined it in the right way. – Pliske
Amazing, thanks a thousand! This way we do not have to convert utf8_general_ci to utf8mb4_generali_ci. Sticking to the UTF8 system in the DB and avoiding other issues. – Geffner
The link in this answer no longer works. – Etch
T
2

I have created a trait for this Which is a mix of the two ideas bellow, it covers missing ones like. 🀩

How to convert Emojis to their respective HTML code entities in PHP 5.3

Idea taken from https://gist.github.com/BarryMode/432a7a1f9621e824c8a3a23084a50f60#file-htmlemoji-php and https://github.com/chefkoch-dev/morphoji

A mix of the 2 ideas above.

trait ConvertEmojis {

/** @var string */
protected static $emojiPattern;

public function convert($str) {

    return preg_replace_callback($this->getEmojiPattern(), array(&$this, 'entity'), $str);
}

protected function entity($matches) {
    return '&#'.hexdec(bin2hex(mb_convert_encoding("$matches[0]", 'UTF-32', 'UTF-8'))).';';
}

/**
 * Returns a regular expression pattern to detect emoji characters.
 *
 * @return string
 */
protected function getEmojiPattern()
{
    if (null === self::$emojiPattern) {
        $codeString = '';

        foreach ($this->getEmojiCodeList() as $code) {
            if (is_array($code)) {

                $first = dechex(array_shift($code));
                $last  = dechex(array_pop($code));
                $codeString .= '\x{' . $first . '}-\x{' . $last . '}';

            } else {
                $codeString .= '\x{' . dechex($code) . '}';
            }
        }

        self::$emojiPattern = "/[$codeString]/u";
    }

    return self::$emojiPattern;
}

/**
 * Returns an array with all unicode values for emoji characters.
 *
 * This is a function so the array can be defined with a mix of hex values
 * and range() calls to conveniently maintain the array with information
 * from the official Unicode tables (where values are given in hex as well).
 *
 * With PHP > 5.6 this could be done in class variable, maybe even a
 * constant.
 *
 * @return array
 */
protected function getEmojiCodeList()
{
    return [
        // Various 'older' charactes, dingbats etc. which over time have
        // received an additional emoji representation.
        0x203c,
        0x2049,
        0x2122,
        0x2139,
        range(0x2194, 0x2199),
        range(0x21a9, 0x21aa),
        range(0x231a, 0x231b),
        0x2328,
        range(0x23ce, 0x23cf),
        range(0x23e9, 0x23f3),
        range(0x23f8, 0x23fa),
        0x24c2,
        range(0x25aa, 0x25ab),
        0x25b6,
        0x25c0,
        range(0x25fb, 0x25fe),
        range(0x2600, 0x2604),
        0x260e,
        0x2611,
        range(0x2614, 0x2615),
        0x2618,
        0x261d,
        0x2620,
        range(0x2622, 0x2623),
        0x2626,
        0x262a,
        range(0x262e, 0x262f),
        range(0x2638, 0x263a),
        0x2640,
        0x2642,
        range(0x2648, 0x2653),
        0x2660,
        0x2663,
        range(0x2665, 0x2666),
        0x2668,
        0x267b,
        0x267f,
        range(0x2692, 0x2697),
        0x2699,
        range(0x269b, 0x269c),
        range(0x26a0, 0x26a1),
        range(0x26aa, 0x26ab),
        range(0x26b0, 0x26b1),
        range(0x26bd, 0x26be),
        range(0x26c4, 0x26c5),
        0x26c8,
        range(0x26ce, 0x26cf),
        0x26d1,
        range(0x26d3, 0x26d4),
        range(0x26e9, 0x26ea),
        range(0x26f0, 0x26f5),
        range(0x26f7, 0x26fa),
        0x26fd,
        0x2702,
        0x2705,
        range(0x2708, 0x270d),
        0x270f,
        0x2712,
        0x2714,
        0x2716,
        0x271d,
        0x2721,
        0x2728,
        range(0x2733, 0x2734),
        0x2744,
        0x2747,
        0x274c,
        0x274e,
        range(0x2753, 0x2755),
        0x2757,
        range(0x2763, 0x2764),
        range(0x2795, 0x2797),
        0x27a1,
        0x27b0,
        0x27bf,
        range(0x2934, 0x2935),
        range(0x2b05, 0x2b07),
        range(0x2b1b, 0x2b1c),
        0x2b50,
        0x2b55,
        0x3030,
        0x303d,
        0x3297,
        0x3299,

        // Modifier for emoji sequences.
        0x200d,
        0x20e3,
        0xfe0f,

        // 'Regular' emoji unicode space, containing the bulk of them.
        range(0x1f000, 0x1f9cf)
    ];
}    

}

Tewell answered 15/4, 2020 at 15:17 Comment(0)

© 2022 - 2024 β€” McMap. All rights reserved.