Oracle JDBC charset and 4000 char limit
Asked Answered
G

2

9

We are trying to store an UTF-16 encoded String into an AL32UTF8 Oracle database.

Our program works perfectly on a database that uses WE8MSWIN1252 as charset. When we try to run it on a database that uses AL32UTF8 it gets to a java.sql.SQLException: ORA-01461: can bind a LONG value only for insert into a LONG column.

In the testcase below everything works fine as long as our input data doesn't get too long.

The input String can exceed 4000 chars. We wish to retain as much information as possible, even though we realise the input will have to be cut off.

Our database tables are defined using the CHAR keyword (see below). We hoped that this would allow us to store up to 4000 chars of any character set. Can this be done? If so, how?

We have tried converting the String to UTF8 using a ByteBuffer without success. OraclePreparedStatement.setFormOfUse(...) also didn't help us out.

Switching to a CLOB is not an option. If the string is too long it needs to be cut.

This is our code at the moment:

public static void main(String[] args) throws Exception {
    String ip ="193.53.40.229";
    int port = 1521;
    String sid = "ora11";
    String username = "obasi";
    String password = "********";

    String driver = "oracle.jdbc.driver.OracleDriver";
    String url = "jdbc:oracle:thin:@" + ip + ":" + port + ":" + sid;
    Class.forName(driver);

    String shortData = "";
    String longData = "";
    String data;

    for (int i = 0; i < 5; i++)
        shortData += "é";

    for (int i = 0; i < 4000; i++)
        longData += "é";

    Connection conn = DriverManager.getConnection(url, username, password);

    PreparedStatement stat = null;
    try  {
        stat = conn.prepareStatement("insert into test_table_short values (?)");
        data = shortData.substring(0, Math.min(5, shortData.length()));
        stat.setString(1, data);
        stat.execute();

        stat = conn.prepareStatement("insert into test_table_long values (?)");
        data = longData.substring(0, Math.min(4000, longData.length()));
        stat.setString(1, data);
        stat.execute();
    } finally {
        try {
            stat.close();
        } catch (Exception ex){}
    }
}

This is the create script of the simple table:

CREATE TABLE test_table_short (
    DATA    VARCHAR2(5 CHAR);
);

CREATE TABLE test_table_long (
    DATA    VARCHAR2(4000 CHAR);
);

The test case works perfectly on the short data. On the long data however it keeps getting the error. Even when our longData is only 3000 characters long, it still doesn't execute successfully.

Thanks in advance!

Geniegenii answered 19/7, 2012 at 14:17 Comment(0)
F
10

Prior to Oracle 12.1, a VARCHAR2 column is limited to storing 4000 bytes of data in the database character set even if it is declared VARCHAR2(4000 CHAR). Since every character in your string requires 2 bytes of storage in the UTF-8 character set, you won't be able to store more than 2000 characters in the column. Of course, that number will change if some of your characters actually require just 1 byte of storage or if some of them require more than 2 bytes of storage. When the database character set is Windows-1252, every character in your string requires only a single byte of storage so you'll be able to store 4000 characters in the column.

Since you have longer strings, would it be possible to declare the column as a CLOB rather than as a VARCHAR2? That would (effectively) remove the length limitation (there is a limit on the size of a CLOB that depends on the Oracle version and the block size but it's at least in the multiple GB range).

If you happen to be using Oracle 12.1 or later, the max_string_size parameter allows you to increase the maximum size of a VARCHAR2 column from 4000 bytes to 32767 bytes.

Fylfot answered 19/7, 2012 at 14:32 Comment(3)
Thank you for your answer. Sadly, in this case, using clob's is out of the question for us. According to link this is the right answer. However, link is pretty misleading in my humble oppinion. Would you know where this is explained in the documentation? We have been searching a lot, but could not find this.Geniegenii
@Geniegenii - I added a comment to the SO thread. The answer is correct in so far as it goes. It just doesn't note that if a particular 4000 characters requires more than 4000 bytes of storage that the 4000 byte capacity limit still kicks in.Fylfot
UTF-8 is a variable length encoding. Many asian characters require at least three bytes to encode.Barnum
F
4

Solved this problem by cutting the String to the require byte length. Note that this can't be done by simply using

stat.substring(0, length)

since this produces an UTF-8 String that might be up to three times longer than allowed.

while (stat.getBytes("UTF8").length > length) {
  stat = stat.substring(0, stat.length()-1);
}

note do not use stat.getBytes() since this is dependent on the set 'file.encoding' and produces either Windows-1252 or UTF-8 bytes!

If you use Hibernate you can do this using org.hibernate.Interceptor!

Foxhole answered 15/3, 2013 at 10:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.