We are having problems with text that is encoded in some different ways but kept in a single column in a table. Long story. On MySQL, I can do "select hex(str) from table where" and I see the bytes of the string exactly as I set them.
On Oracle, I have a string which starts with the Turkish character İ, which is the Unicode character 0x0130 "LATIN CAPITAL LETTER WITH DOT ABOVE". This is in my printed copy of the Unicode Version 2.0 book. In UTF-8, this character is 0xc4b0.
We have very old client apps we need to support. They would send us this text in "windows-1254". We used to just close our eyes, store it, and hand it back later. Now we need the Unicode, or are being given the Unicode.
So I have:
SQL> select id, name from table where that thing;
ID NAME
------ ------------------------
746 Ý
This makes sense because the "İ" is 0xdd in windows-1254 and 0xdd in wondows-1252 is "Ý". My terminal is presumably set to the usual windows-1252.
But:
SQL> select id, rawtohex(name) from table where that thing;
ID RAWTOHEX(NAME)
------ ------------------------
746 C39D
There seems to be no equivalent to the hex(name) function in MySQL. But I must be missing something. What am I missing here?
My java code has to take the utf8 that I am supplied and save a utf8 copy and a windows-1252 copy. The java code gives me:
bytes (utf8): c4 b0
bytes (1254): dd
Yet, when I save it, the client does not get the correct character. And when I try to see what Oracle has actually stored, i get the garbage seen above. I have no idea where the C39D is coming from. Any suggestions?
We have ojdbc14.jar built into all of our applications and we are connecting to a database that says it is "Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production".