String's Maximum length in Java - calling length() method
Asked Answered
E

7

168

In Java, what is the maximum size a String object may have, referring to the length() method call?

I know that length() return the size of a String as a char [];

Ertha answered 3/5, 2009 at 2:31 Comment(1)
While the length of a String is theoretically Integer.MAX_VALUE, the length of a string literal in the source appears to be limited to only 65535 bytes of UTF-8 data.Abeyance
N
185

Considering the String class' length method returns an int, the maximum length that would be returned by the method would be Integer.MAX_VALUE, which is 2^31 - 1 (or approximately 2 billion.)

In terms of lengths and indexing of arrays, (such as char[], which is probably the way the internal data representation is implemented for Strings), Chapter 10: Arrays of The Java Language Specification, Java SE 7 Edition says the following:

The variables contained in an array have no names; instead they are referenced by array access expressions that use nonnegative integer index values. These variables are called the components of the array. If an array has n components, we say n is the length of the array; the components of the array are referenced using integer indices from 0 to n - 1, inclusive.

Furthermore, the indexing must be by int values, as mentioned in Section 10.4:

Arrays must be indexed by int values;

Therefore, it appears that the limit is indeed 2^31 - 1, as that is the maximum value for a nonnegative int value.

However, there probably are going to be other limitations, such as the maximum allocatable size for an array.

Nogood answered 3/5, 2009 at 2:35 Comment(13)
Integer.MAX_VALUE is 2^31-1, actually. :)Mestee
You're absolutely correct! Thanks for pointing that out, I've fixed the my answer.Nogood
Great answer man! I took a look on String.java source code and it's right, 'count' is the int variable who returns the length of the char array, and the char array is stored on the 'value' variable (as char [ ]) It means that the String size could be around 2GB. Of course there could be limitations to allocate such memory size. Thanks!Ertha
The maximum allocatable size for an array is Integer.MAX_VALUE The maximum size for an array and a String is the same. Note: each character uses two bytes and to build this string you need another two bytes. e.g. using StringBuilder or a plain char[] so you need over 8 GB to create a maximum length string.Kohima
As applications get larger, the Integer.MAX_VALUE will be a limitation. There isn't an obvious way around this unfortunately. You could create a LongString using an char[][] i.e. without increasing the maximum length of an array. However, it would be a better solution if an array had a maximum length of Long.MAX_VALUEKohima
I just tried defining a string literal in a hello world java program which was longer than 65546. javac gives an error about that literal being too long: javac HelloWorld.java 2>&1|head -c 80 HelloWorld.java:3: constant string too longCalisa
@dlamblin: That sounds like a limitation of javac for String literals (not String objects), as I cannot find any reference to size limits to String literals in the Java Language Specification and JVM Specification. I tried making a String literal that was larger than 100,000 characters, and the Eclipse compiler didn't have a problem compiling it. (And running the program was able to show that the literal had a String.length larger than 100,000.)Nogood
@PeterLawrey I didn't understand this statement -" Note: each character uses two bytes and to build this string you need another two bytes. e.g. using StringBuilder or a plain char[]" .. can you pls explain which are these 'another two bytes' ?Invercargill
@Invercargill It was three years ago so I had to think about it. ;) What I meant was; to build a maximum sized string you need alot of memory, possibly more than you have anyway. You need two bytes per character ~ 4GB, but you need to build this from a StringBuilder or char[] which means you need another two bytes per character to create it in the first place, i.e. another ~ 4 GB (at least temporarily)Kohima
@PeterLawrey So the theoretical value is 2^31 - 1 but the actual value really do depend on the memory available in the JVM heap, for Java 8 it depends on the physical memory right?Christachristabel
@xkm The physical memory is a practical limitation in small machines. Note: to builds a String with 2^31-1 chars (2 bytes) you need the same again as a char[] or StringBuilder. You can buy 64 GB for about $400, so even this doesn't represent a lot of memory.Kohima
I just tried with current Oracle’s JVM implementation and got a maximum char[] length of Integer.MAX_VALUE-2. Since String uses a char array, the current practical limit is slightly lower than the theoretical limit.Inconformity
how to store text more than String can store ? is there Big String?Effeminize
H
31

java.io.DataInput.readUTF() and java.io.DataOutput.writeUTF(String) say that a String object is represented by two bytes of length information and the modified UTF-8 representation of every character in the string. This concludes that the length of String is limited by the number of bytes of the modified UTF-8 representation of the string when used with DataInput and DataOutput.

In addition, The specification of CONSTANT_Utf8_info found in the Java virtual machine specification defines the structure as follows.

CONSTANT_Utf8_info {
    u1 tag;
    u2 length;
    u1 bytes[length];
}

You can find that the size of 'length' is two bytes.

That the return type of a certain method (e.g. String.length()) is int does not always mean that its allowed maximum value is Integer.MAX_VALUE. Instead, in most cases, int is chosen just for performance reasons. The Java language specification says that integers whose size is smaller than that of int are converted to int before calculation (if my memory serves me correctly) and it is one reason to choose int when there is no special reason.

The maximum length at compilation time is at most 65536. Note again that the length is the number of bytes of the modified UTF-8 representation, not the number of characters in a String object.

String objects may be able to have much more characters at runtime. However, if you want to use String objects with DataInput and DataOutput interfaces, it is better to avoid using too long String objects. I found this limitation when I implemented Objective-C equivalents of DataInput.readUTF() and DataOutput.writeUTF(String).

Hysteric answered 3/5, 2009 at 2:31 Comment(2)
This should be the default answer.Giffie
This is the correct answer. Specifically the part about the specification of CONSTANT_Utf8_info :)Mclain
M
20

Since arrays must be indexed with integers, the maximum length of an array is Integer.MAX_INT (231-1, or 2 147 483 647). This is assuming you have enough memory to hold an array of that size, of course.

Mestee answered 3/5, 2009 at 2:34 Comment(0)
B
19

I have a 2010 iMac with 8GB of RAM, running Eclipse Neon.2 Release (4.6.2) with Java 1.8.0_25. With the VM argument -Xmx6g, I ran the following code:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
    try {
        sb.append('a');
    } catch (Throwable e) {
        System.out.println(i);
        break;
    }
}
System.out.println(sb.toString().length());

This prints:

Requested array size exceeds VM limit
1207959550

So, it seems that the max array size is ~1,207,959,549. Then I realized that we don't actually care if Java runs out of memory: we're just looking for the maximum array size (which seems to be a constant defined somewhere). So:

for (int i = 0; i < 1_000; i++) {
    try {
        char[] array = new char[Integer.MAX_VALUE - i];
        Arrays.fill(array, 'a');
        String string = new String(array);
        System.out.println(string.length());
    } catch (Throwable e) {
        System.out.println(e.getMessage());
        System.out.println("Last: " + (Integer.MAX_VALUE - i));
        System.out.println("Last: " + i);
    }
}

Which prints:

Requested array size exceeds VM limit
Last: 2147483647
Last: 0
Requested array size exceeds VM limit
Last: 2147483646
Last: 1
Java heap space
Last: 2147483645
Last: 2

So, it seems the max is Integer.MAX_VALUE - 2, or (2^31) - 3

P.S. I'm not sure why my StringBuilder maxed out at 1207959550 while my char[] maxed out at (2^31)-3. It seems that AbstractStringBuilder doubles the size of its internal char[] to grow it, so that probably causes the issue.

Bushido answered 3/5, 2009 at 2:31 Comment(1)
A very useful practical treatment of the questionGrapheme
L
6

apparently it's bound to an int, which is 0x7FFFFFFF (2147483647).

Lamp answered 3/5, 2009 at 2:36 Comment(0)
K
5

The Return type of the length() method of the String class is int.

public int length()

Refer http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#length()

So the maximum value of int is 2147483647.

String is considered as char array internally,So indexing is done within the maximum range. This means we cannot index the 2147483648th member.So the maximum length of String in java is 2147483647.

Primitive data type int is 4 bytes(32 bits) in java.As 1 bit (MSB) is used as a sign bit,The range is constrained within -2^31 to 2^31-1 (-2147483648 to 2147483647). We cannot use negative values for indexing.So obviously the range we can use is from 0 to 2147483647.

Kentledge answered 3/5, 2009 at 2:31 Comment(0)
B
2

As mentioned in Takahiko Kawasaki's answer, java represents Unicode strings in the form of modified UTF-8 and in JVM-Spec CONSTANT_UTF8_info Structure, 2 bytes are allocated to length (and not the no. of characters of String).
To extend the answer, the ASM jvm bytecode library's putUTF8 method, contains this:

public ByteVector putUTF8(final String stringValue) {
    int charLength = stringValue.length();
    if (charLength > 65535) {   
   // If no. of characters> 65535, than however UTF-8 encoded length, wont fit in 2 bytes.
      throw new IllegalArgumentException("UTF8 string too large");
    }
    for (int i = 0; i < charLength; ++i) {
      char charValue = stringValue.charAt(i);
      if (charValue >= '\u0001' && charValue <= '\u007F') {
        // Unicode code-point encoding in utf-8 fits in 1 byte.
        currentData[currentLength++] = (byte) charValue;
      } else {
        // doesnt fit in 1 byte.
        length = currentLength;
        return encodeUtf8(stringValue, i, 65535);
      }
    }
    ...
}

But when code-point mapping > 1byte, it calls encodeUTF8 method:

final ByteVector encodeUtf8(final String stringValue, final int offset, final int maxByteLength /*= 65535 */) {
    int charLength = stringValue.length();
    int byteLength = offset;
    for (int i = offset; i < charLength; ++i) {
      char charValue = stringValue.charAt(i);
      if (charValue >= 0x0001 && charValue <= 0x007F) {
        byteLength++;
      } else if (charValue <= 0x07FF) {
        byteLength += 2;
      } else {
        byteLength += 3;
      }
    }
   ...
}

In this sense, the max string length is 65535 bytes, i.e the utf-8 encoding length. and not char count
You can find the modified-Unicode code-point range of JVM, from the above utf8 struct link.

Ballance answered 3/5, 2009 at 2:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.