UTF-16BE to UTF-16LE, and back
Asked Answered
Z

1

6

I have a Blackberry project that I'm working on and I need to convert byte arrays of strings encoded using UTF-16LE (little endian) to a byte array of string in the UTF-16BE (big endian) encoding, and vis. versa. A server I'm connecting to is sending the BlackBerry device byte arrays of strings in the UTF-16LE encoding however the device doesn't natively support UTF-16LE. When I try to decode the byte arrays back into strings, the strings are illegible. The device does, however, support UTF-16BE. I also need to reverse this process, i.e. convert a byte array of a string with UTF-16BE encoding into the what the server is expecting (UTF-16LE). Thanks.

I cannot do this on the device:

String test = "test";
byte[] testBytes = test.getBytes("UTF-16LE");// throws UnsupportedEncodingException

I can do this:

String test = "test";
byte[] testBytes = test.getBytes("UTF-16BE");//works
Zealotry answered 24/8, 2012 at 1:5 Comment(2)
What do you get if you just use byte[] testBytes = test.getBytes("UTF-16")? Does your server put the proper BOM character at the beginning of the string and does Blackberry automatically detect big endian?Jocundity
@Jocundity test.getBytes("UTF-16") throws an exception as well. I'm not sure if the server puts the BOM at the beginning of the string. It's an ASP.Net ADFS server (if that helps). BlackBerry does not auto-detect. Thanks.Zealotry
S
12

UTF-16 uses two bytes per codeunit, with some Unicode codepoints encoded using one codeunit and other codepoints using two codeunits (called a surrogate pair).

To convert between UTF-16LE and UTF-16BE, simply loop through the bytes swapping the order of each 2-byte pair of each codeunit. The order of surrogate codeunits does not change between LE and BE. IOW, simply swap bytes 0 and 1 with each other, swap bytes 2 and 3 with each other, and so on.

Skydive answered 24/8, 2012 at 1:49 Comment(1)
I see. I'll try that and report back. Thanks!Zealotry

© 2022 - 2024 — McMap. All rights reserved.