UTF-8 to EBCDIC in Java
Asked Answered
C

4

17

Our requirement is to send EBCDIC text to mainframe. We have some chinese characters thus UTF8 format. So, is there a way to convert the UTF-8 characters to EBCDIC?

Thanks, Raj Mohan

Chivers answered 21/4, 2009 at 4:36 Comment(1)
Would UTF-8 to EBCDIC conversion be lossless? That is, can you transform back and forth and still get the same EBCDIC bytes every time?Flue
E
10

Assuming your target system is an IBM mainframe or midrange, it has full support for all of the EBCDIC encodings built into it's JVM as encodings named CPxxxx, corresponding to the IBM CCSID's (CP stands for code-page). You will need to do the translations on the host-side since the client side will not have the necessary encoding support.

Since Unicode is DBCS and greater, and supports every known character, you will likely be targeting multiple EBCDIC encodings; so you will likely configure those encodings in some way. Try to have your client Unicode (UTF-8, UTF-16, etc) only, with the translations being done as data arrives on the host and/or leaves the host system.

Other than needing to do translations host-side, the mechanics are the same as any Java translation; e.g. new String(bytes,encoding) and String.getBytes(encoding), and the various NIO and writer classes. There's really no magic - it's no different than translating between, say, ISO 8859-x and Unicode, or any other SBCS (or limited DBCS).

For example:

byte[] ebcdta="Hello World".getBytes("CP037");  // get bytes for EBCDIC codepage 37

You can find more information on IBM's documentation website.

Entozoic answered 21/4, 2009 at 6:49 Comment(0)
H
6

EBCDIC has many 8-Bit Codepages. Many of them are supported by the VM. Have a look at Charset.availableCharsets().keySet(), the EBCDIC pages are named IBM... (there are aliases like cp500 for IBM500 as you can see by Charset.forName("IBM500").aliases()).

There are two problems:

  1. if you have characters included in different code pages of EBCDIC, this will not help
  2. i am not sure, if these charsets are available in any vm outside windows.

For the first, have a look at this approach. For the second, have a try on the desired target runtime ;-)

Hydrogenize answered 21/4, 2009 at 7:23 Comment(1)
Not all of the charsets that are named IBM* are EBCDIC. For example, IBM850 is the standard codepage used in U.S. and western European versions of Windows in the command prompt.Sixtyfourmo
M
6

You can always make use of the IBM Toolbox for Java (JTOpen), specifically the com.ibm.as400.access.AS400Text class in the jt400.jar.

It goes as follows:

int codePageNumber = 420;
String codePage = "CP420";
String sourceUtfText = "أحمد يوسف صالح";

AS400Text converter = new AS400Text(sourceUtfText.length(), codePageNumber);
byte[] bytesData = converter.toBytes(sourceUtfText);
String resultedEbcdicText = new String(bytesData, codePage);

I used the code-page 420 and its corresponding java representation of the encoding CP420, this code-page is used for Arabic text, so, you should pick the suitable code-page for Chinese text.

Moorfowl answered 17/8, 2009 at 7:11 Comment(0)
K
2

For the midrange AS/400 (IBM i these days) the best bet is to use the IBM Java Toolkit (jt400.jar) which does all these things transparently (perhaps slightly hinted).

Please note that inside Java a character is a 16 bit value, not an UTF-8 (that is an encoding).

Kinetic answered 21/4, 2009 at 14:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.