How do you get emojis to display in a Unity TextMeshPro element?
Asked Answered
M

1

7

I can't seem to find any sort of posts or videos online about this topic, so I'm starting to wonder if it's just not possible. Everything about "emojis" in Unity is just a simple implementation of a spritesheet and then manually indexing them with like <sprite=0>. I'm trying to pull tweets from Twitter and then display their text with emojis, so clearly this isn't feasible to do with the 1500+ emojis that unicode supports.

I believe I've correctly created a TMP font asset using the default Windows emoji font, Segoe UI Emoji, and it looks like using some unicode hex ranges I found on an online unicode database, I was able to detect 1505 emojis in the font. Successful emoji font creation

I then set the emoji font as a fall-back font in the Project Settings: Setting emoji font as fall-back

But upon running the game, I still get the same error that The character with Unicode value \uD83D was not found in the [SEGOEUI SDF] font asset or any potential fallbacks. It was replaced by Unicode character \u25A1 in text object

In the console an output of the tweet text looks something like this: #cat #cats #CatsOfTwitter #CatsOnTwitter #pet \nLike &amp; share , Thanks!\uD83D\uDE4F\uD83D\uDE4F\uD83D\uDE4F

From some looking around online and extremely basic knowledge of unicode, I theorize that the issue is that in the tweet body, the emojis are in UTF-16 surrogate pairs or whatever, where \uD83D\uDE4F is one emoji, but my emoji font is in UTF-32, so it's looking for u+0001f64f. So would I need to find a way to get it to read the full surrogate pair and then convert to UTF-32 to get the correct emoji to render?

Any help would be greatly appreciated, I've tried asking around the Unity Discord server, but nobody else knows how to solve this issue either.

Mensch answered 23/12, 2022 at 19:53 Comment(2)
Joy Pixels – Sezen
@YashVakil This is not what OP is looking for – Samons
S
4

Intro

TMPro is natively able to do this, but only with UTF-32 formatted unicode. For example, \U0001F600 is 'πŸ˜€οΈŽ'. Your emojis are formatted in what I believe is UTF-8 (correct me if i'm wrong), being \u1F600, which is still 'πŸ˜€οΈŽ'. The only difference between these two are the capital U and 3 zeros prepending it. This makes it very easy to convert. Typing the UTF-32 version into TMPro shows the emoji as normal. What you are looking for is converting UTF-16 surrogate pairs into UTF-32, which is included further down.

Luckily, this solution does not require any font modification, the default font is able to do this, and I didn't change any settings in the inspector.

UTF-8 Solution

This solution below is for non-surrogate pair UTF-8 code.

To convert UTF-8 to UTF-32, we just need to change the 'u' to be uppercase and add a few zeros prepending it. To do so, we can use System.RegularExpressions.Regex.Replace.

public string ToUTF32(string input)
{
    string output = input;
    Regex pattern = new Regex(@"\\u[a-zA-Z0-9]*");

    while (output.Contains(@"\u"))
    {
        output = pattern.Replace(output, @"\U000" + output.Substring(output.IndexOf(@"\u", StringComparison.Ordinal) + 2, 5), 1);
    }

    return output;
}

input being the string that contains the emoji unicode. The function converts all of the unicode in the string, and keeps everything else as it was.

Explanation

This code is pretty long, so this is the explanation.

First, the code takes the input string, for example, blah blah \u1F600 blah \u1F603 blah, which contains 2 of the unicode emojis, and replaces the unicode with another long string of code, which is the next section.

Secondly, it takes the input and Substrings everything after "\u", 5 characters ahead. It replaces the text with "\U000" + the aforementioned string.

It repeats the above steps until all of the unicode is translated.

This outputs the correct string to do the job.

If anyone thinks the above information is incorrect, please let me know. My vocabulary on this subject is not the best, so I am willing to take corrections.

Surrogate Pairs Solution

I have tinkered for a little while and come up with the function below.

public string ToUTF32FromPair(string input)
{
    var output = input;

    Regex pattern = new Regex(@"\\u[a-zA-Z0-9]*\\u[a-zA-Z0-9]*");

    while (output.Contains(@"\u"))
    {
        output = pattern.Replace(output, 
            m => {
                var pair = m.Value;
                var first = pair.Substring(0, 6);
                var second = pair.Substring(6, 6);
                var firstInt = Convert.ToInt32(first.Substring(2), 16);
                var secondInt = Convert.ToInt32(second.Substring(2), 16);
                var codePoint = (firstInt - 0xD800) * 0x400 + (secondInt - 0xDC00) + 0x10000;
                return @"\U" + codePoint.ToString("X8");
            }, 
            1
        );
    }

    return output;
}

This does basically the same thing as before except it takes in the input that has surrogate pairs in it and translates it.

Emoji & Inspector

Samons answered 30/12, 2022 at 5:33 Comment(3)
You could convert to utf-32 – Kantian
To make sure I understand this clearly, TMP is looking for @"\U0001F600" i.e. a string that resembles a UTF-32 character, and not '\U0001F600' i.e. an actual UTF-32 character? At least that's my assumption because I think you are returning a string literal @"\U0001F600" from the function. – Mensch
Yes, TMP just wants it as a string, not a string literal. Sorry for the confusion, but the answer on this post seems to fix this problem. – Samons

© 2022 - 2024 β€” McMap. All rights reserved.