Is there any free java library which I can use to convert string in one encoding to other encoding, something like iconv
? I'm using Java version 1.3.
You don't need a library beyond the standard one - just use Charset. (You can just use the String constructors and getBytes methods, but personally I don't like just working with the names of character encodings. Too much room for typos.)
EDIT: As pointed out in comments, you can still use Charset instances but have the ease of use of the String methods: new String(bytes, charset) and String.getBytes(charset).
See "URL Encoding (or: 'What are those "%20
" codes in URLs?')".
CharsetDecoder
should be what you are looking for, no ?
Many network protocols and files store their characters with a byte-oriented character set such as ISO-8859-1
(ISO-Latin-1
).
However, Java's native character encoding is Unicode UTF16BE (Sixteen-bit UCS Transformation Format, big-endian byte order).
See Charset
. That doesn't mean UTF16
is the default charset (i.e.: the default "mapping between sequences of sixteen-bit Unicode code units and sequences of bytes"):
Every instance of the Java virtual machine has a default charset, which may or may not be one of the standard charsets.
[US-ASCII
,ISO-8859-1
a.k.a.ISO-LATIN-1
,UTF-8
,UTF-16BE
,UTF-16LE
,UTF-16
]
The default charset is determined during virtual-machine startup and typically depends upon the locale and charset being used by the underlying operating system.
This example demonstrates how to convert ISO-8859-1
encoded bytes in a ByteBuffer
to a string in a CharBuffer
and visa versa.
// Create the encoder and decoder for ISO-8859-1
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharsetEncoder encoder = charset.newEncoder();
try {
// Convert a string to ISO-LATIN-1 bytes in a ByteBuffer
// The new ByteBuffer is ready to be read.
ByteBuffer bbuf = encoder.encode(CharBuffer.wrap("a string"));
// Convert ISO-LATIN-1 bytes in a ByteBuffer to a character ByteBuffer and then to a string.
// The new ByteBuffer is ready to be read.
CharBuffer cbuf = decoder.decode(bbuf);
String s = cbuf.toString();
} catch (CharacterCodingException e) {
}
I would just like to add that if the String is originally encoded using the wrong encoding it might be impossible to change it to another encoding without errors. The question does not state that the conversion here is made from wrong encoding to correct encoding but I personally stumbled to this question just because of this situation so just a heads up for others as well.
This answer in other question gives an explanation why the conversion does not always yield correct results https://mcmap.net/q/20949/-quot-fix-quot-string-encoding-in-java
It is a whole lot easier if you think of unicode as a character set (which it actually is - it is very basically the numbered set of all known characters). You can encode it as UTF-8 (1-3 bytes per character depending) or maybe UTF-16 (2 bytes per character or 4 bytes using surrogate pairs).
Back in the mist of time Java used to use UCS-2 to encode the unicode character set. This could only handle 2 bytes per character and is now obsolete. It was a fairly obvious hack to add surrogate pairs and move up to UTF-16.
A lot of people think they should have used UTF-8 in the first place. When Java was originally written unicode had far more than 65535 characters anyway...
© 2022 - 2024 — McMap. All rights reserved.