grep for emojis in linux
Asked Answered
A

1

10

I am trying to grep across a list of tokens that include several non-ASCII characters. I want to match only emojis, other characters such as ð or ñ are fine. The unicode range for emojis appears to be U+1F600-U+1F1FF but when I search for it using grep this happens:

grep -P "[\x1F6-\x1F1]" contact_names.tokens                                                                                                                                                                                                                                
grep: range out of order in character class 

https://unicode.org/emoji/charts/full-emoji-list.html#1f3f4_e0067_e0062_e0077_e006c_e0073_e007f

Amazonite answered 10/9, 2018 at 22:31 Comment(1)
I personally get conflicting matchers specified error with this pattern.Metalinguistic
A
13

You need to specify the code points with full value (not 1F6 but 1F600) and wrap them with curly braces. In addition, the first value must be smaller than the last value. So the regex should be "[\x{1F1FF}-\x{1F600}]".

The unicode range for emojis is, however, more complex than you assumed. The page you referred does not sort characters by code point and emojis are placed in many blocks. If you want to cover almost all of emoji:

grep -P "[\x{1f300}-\x{1f5ff}\x{1f900}-\x{1f9ff}\x{1f600}-\x{1f64f}\x{1f680}-\x{1f6ff}\x{2600}-\x{26ff}\x{2700}-\x{27bf}\x{1f1e6}-\x{1f1ff}\x{1f191}-\x{1f251}\x{1f004}\x{1f0cf}\x{1f170}-\x{1f171}\x{1f17e}-\x{1f17f}\x{1f18e}\x{3030}\x{2b50}\x{2b55}\x{2934}-\x{2935}\x{2b05}-\x{2b07}\x{2b1b}-\x{2b1c}\x{3297}\x{3299}\x{303d}\x{00a9}\x{00ae}\x{2122}\x{23f3}\x{24c2}\x{23e9}-\x{23ef}\x{25b6}\x{23f8}-\x{23fa}]"  contact_names.tokens

(The range is borrowed from Suhail Gupta's answer on a similar question)

If you need to allow/disallow specific emoji blocks, see sequence data on unicode.org. List of emoji on Wikipedia also show characters in ordered tables but it might not list latest ones.

Amu answered 29/6, 2019 at 15:33 Comment(1)
Finally a clean solution to getting rid of all those emojis when I use youtube-dl.Francefrancene

© 2022 - 2024 — McMap. All rights reserved.