method String.getBytes() is big endian or litter endian?
Asked Answered
M

1

8

I need to send String to client socket, for right sequence, the endian is important but I not saw endian information in source code.Does it needn't care about or just I skipped those code?

Minimalist answered 29/2, 2016 at 16:31 Comment(0)
I
12

getBytes() uses the system's default charset, which means basically all bets are off. It could be big-endian UTF-16, little-endian UTF-16, UTF-8, ISO-8859-1... basically anything.

If you need to specify endianness, or anything about the charset for that matter, you should use getBytes(Charset) or getBytes(String). There are a few standard charsets that all JREs support — including UTF_16BE (big endian) and UTF_16LE (little endian).

Isochronize answered 29/2, 2016 at 16:33 Comment(5)
I have some confuse about whether use UTF-8 String.getBytes(StandardCharsets.UTF_8) will decide the endianness way?Minimalist
UTF-8 doesn't have endianness. You can read more here, but basically, endianness only matters if you read multiple bytes at once, in the same word. UTF-8 is defined as just a byte stream, with no multiple-byte words; even code points that require multiple bytes are defined in terms of multiple 1-byte words. So as a reader/writer, you're just dealing with "next byte, next byte, next byte," in which case endianness isn't a factor.Isochronize
@Minimalist UTF-8 defines the order the bytes should appear for a multi-byte character encoding. It's not really little or big endian as such.Habile
But why there exist utf-16be and utf-16le. Isn't it does not matter of endinness?Minimalist
@Minimalist UTF-16 is defined not in terms of individual bytes, but in terms of two-byte (=16 bit) words. So, since you have multi-byte words, endianness does matter. Basically, if you're reading UTF-8, you're always asking "give me the next 1-byte word," so the computer doesn't have to negotiate with you over which byte is most significant. But if you're reading UTF-16, you're always asking "give me the next 2-byte word," and for that the computer needs to know which of those two bytes, in the underlying stream is, most significant.Isochronize

© 2022 - 2024 — McMap. All rights reserved.