Right now I'm using this piece of code :
public static bool ContainsEmoji(this string text)
{
Regex rgx = new Regex(@"\p{Cs}");
return rgx.IsMatch(text);
}
And it's being somewhat helpful.
Most of them appear to be detected, but some aren't.
Here's a reference list to help : http://unicode.org/emoji/charts/full-emoji-list.html
All the smiley faces appear to be fine, but these specific emojis do not get caught by the Regex :
1920 U+2614 ☔ umbrella with rain drops
1921 U+26F1 ⛱ umbrella on ground
1922 U+26A1 ⚡ high voltage
1923 U+2744 ❄ snowflake
On the keyboard these are not close to each other, but in the list they are following each other, so I just assumed that there was a point where it would start not working in the emoji list, and it's not really verifying. From 1905 (weather-like emojis), going down, some are caught in the regex, some aren't. There does not seem to be any rule.
I can't afford to just go full ASCII because I need people to enter characters such as cyrillic, but I can't accept emojis specifically. I have no clue how to go forward from here.
I read the MSDN docs about surrogates high/low pairs, but at this stage this is very confusing to me, and I think some push in the right direction would go a long way.
Thank you very much for your time :)
{Cs}
is equivalent to{Surrogate}
. AFAIK there isn't an Unicode category for emojis then you have to list each sub-range separately. – Descant"but these specific emojis do not get caught"
– Copra