I am desperately trying to replace certain unicode characters (graphemes) from a file using sed. However I keep failing for some of them, namely the ones from unicode blocks:
\p{InHigh_Surrogates}: U+D800–U+DB7F
\p{InHigh_Private_Use_Surrogates}: U+DB80–U+DBFF
\p{InLow_Surrogates}: U+DC00–U+DFFF
I tried (in a sed config file loaded via the -f switch):
s/\p{InHigh_Surrogates}/###/ --> no effect at all
s/\\p\{InHigh_Surrogates\}/###_D-NON-UTF8_###/ -> error message 'Invalid content of \{\}'
Anybody got a suggestion? Also, I am not necessarily focused on using the blocks - but I also failed trying to define a character range of the form \xd800-\xdfff.
Thanks, Thomas