I've got a text input from a mobile device. It contains emoji. In C#, I have the text as
Text π«π text
Simply put, I want the output text to be
Text text
I'm trying to just remove all such emojis from the text with rejex.. except, I'm not sure how to convert that emoji into it's unicode sequence.. How do I do that?
edit:
I'm trying to save the user input into mysql. It looks like mysql UTF8 doesn't really support unicode characters and the right way to do it would be by changing the schema but I don't think that is an option for me. So I'm trying to just remove all the emoji characters before saving it in the database.
This is my schema for the relevant column:
I'm using Nhibernate as my ORM and the insert query generated looks like this:
Insert into `Content` (ContentTypeId, Comments, DateCreated)
values (?p0, ?p1, ?p2);
?p0 = 4 [Type: Int32 (0)]. ?p1 = 'Text π«π text' [Type: String (20)], ?p2 = 19/01/2015 10:38:23 [Type: DateTime (0)]
When I copy this query from logs and run it on mysql directly, I get this error:
1 warning(s): 1366 Incorrect string value: '\xF0\x9F\x98\x80 t...' for column 'Comments' at row 1 0.000 sec
Also, I've tried to convert it into encoding bytes and it doesn't really work..
public virtual String Comments { get; set; }
property. The insert query produced is fine, it's just that mysql db can't handle the unicode. β Inbred