How many characters can a Java String have?
Asked Answered
M

7

169

I'm trying The Next Palindrome problem from Sphere Online Judge (SPOJ) where I need to find a palindrome for a integer of up to a million digits. I thought about using Java's functions for reversing Strings, but would they allow for a String to be this long?

Mitrewort answered 24/7, 2009 at 20:24 Comment(3)
are you saying that you need to write a function that generates palindromes, the size of which is user specified and can be up to 1 million characters in length?Saltatory
The Problem (from SPOJ) may contain a 100Gigabyte file, and you like to load it into a string at once? Seriously... please use a Scanner!Impresario
Possible duplicate of String's Maximum length in Java - calling length() methodDerrickderriey
N
254

You should be able to get a String of length

  1. Integer.MAX_VALUE always 2,147,483,647 (231 - 1)
    (Defined by the Java specification, the maximum size of an array, which the String class uses for internal storage)
    OR

  2. Half your maximum heap size (since each character is two bytes) whichever is smaller.

Neodarwinism answered 24/7, 2009 at 20:27 Comment(11)
... or your maximum heap size divided by 2 ... since character is 2 bytesDistributary
how do I find out the maximum heap size? Also, I don't know which java virtual machine the judge is using to test my problem is Integer.MAX_VALUE part of the spec of JVM dependant?Mitrewort
Integer.MAX_VALUE is always 2147483647 (2^31 - 1), that's part of the Java Specification.Crew
Assuming a 64-bit JVM, since you'd need 8GB of virtual memory to store a string of that length.Pileup
@dmindreader: Integer.MAX_VALUE is JVM independent, so you can always guarantee it will be the same. @CD1: Thanks for clarifying that while I was AFK, I added it to my answer. :)Neodarwinism
Actually you want to divide your memory by 4-6 as you need a StringBuilder or the like to build your String i.e. there must be two copies in memory at some point. If your StringBuilder's capacity is just right, divide by 4, but if its not divide by 6 is safer.Labbe
@Peter: I don't follow you. Why do you say "there must be two copies in memory at some point"? Is this due to some limitation of the JVM, or are you talking about an implementation of the palindrome problem that dmindreader is trying to solve?Neodarwinism
@ChssPly76: Valid for current JVMs - but it is quite possible to create a JVM without a maximum heap size. In fact it is quite easy: just request more memory from the OS when the heap runs out and garbage collection failed to free the needed memory.Guy
Java 9 is going to use a single byte per character for strings having only iso-latin-1 content, so such strings can have as many characters as the heap in bytes (or max array length, whatever is smaller), but on the other hand, since non-latin strings use two bytes in an array, the maximum string length will be halved for them in Java 9, only supporting 1073741823 characters.Touber
Doesn't the two bytes required for a char object depend on the encoding? UTF8 requires 1 byte per ASCII character, 2 for BMP, 3-4 for nonBMP. Or are Char objects always 2?Tannin
@ITIA: the comment by Holger, and answer by Peter Lawrey are correct. Through Java 8 String uses UTF16: 2 bytes for BMP char, 4 for supplementary; in 9 up it uses 1 byte per char if all chars are 8859-1 aka Latin-1 aka block 0, otherwise UTF16 as before. It never uses UTF8 in String (or Builder/Buffer), although you can use UTF8 for I/O. Primitive char was and remains always 2 bytes (16 bits), but String element may now be converted from/to char. PS: UTF8 2bytes only covers up to U+0FFF which is not nearly all BMP.Epilate
S
21

I believe they can be up to 2^31-1 characters, as they are held by an internal array, and arrays are indexed by integers in Java.

Shroyer answered 24/7, 2009 at 20:26 Comment(2)
The internal implementation is irrelevant - there's no reason why the character data couldn't be stored in an array of longs, for instance. The problem is the interface uses ints for length. getBytes and similar may have problems if you try for a very large string.Erick
That is true - I was implying that fact. My bad.Shroyer
L
16

While you can in theory Integer.MAX_VALUE characters, the JVM is limited in the size of the array it can use.

public static void main(String... args) {
    for (int i = 0; i < 4; i++) {
        int len = Integer.MAX_VALUE - i;
        try {
            char[] ch = new char[len];
            System.out.println("len: " + len + " OK");
        } catch (Error e) {
            System.out.println("len: " + len + " " + e);
        }
    }
}

on Oracle Java 8 update 92 prints

len: 2147483647 java.lang.OutOfMemoryError: Requested array size exceeds VM limit
len: 2147483646 java.lang.OutOfMemoryError: Requested array size exceeds VM limit
len: 2147483645 OK
len: 2147483644 OK

Note: in Java 9, Strings will use byte[] which will mean that multi-byte characters will use more than one byte and reduce the maximum further. If you have all four byte code-points e.g. emojis, you will only get around 500 million characters

Labbe answered 7/12, 2016 at 18:42 Comment(2)
Compact Strings in Java 9 use either Latin-1 or UTF-16 encoding. No variable length encoding, that is, no three byte characters.Quarry
@Quarry "It is not a goal to use alternate encodings such as UTF-8" thank you for the correction.Labbe
V
5

Have you considered using BigDecimal instead of String to hold your numbers?

Vacillatory answered 24/7, 2009 at 21:58 Comment(2)
It depends on what the application is going to do with the numbers. If it is going to just do textual things like finding palindromes, counting (decimal) digits, then a String is better. If it is going to be doing arithmetic, a BigDecimal (or BigInteger) is better.Polliwog
The problem is "For each K, output the smallest palindrome larger than K." (where K is the number given). It would be trivially simple to output the first palindrome smaller than K. You require arithmetic to find one larger than K. Example: Find the next palindrome larger than 999999999999, or the next palindrome larger than 12922.Marko
T
4

Integer.MAX_VALUE is max size of string + depends of your memory size but the Problem on sphere's online judge you don't have to use those functions

Tarpon answered 24/7, 2009 at 20:29 Comment(0)
M
3

Java9 uses byte[] to store String.value, so you can only get about 1GB Strings in Java9. Java8 on the other hand can have 2GB Strings.

By character I mean "char"s, some character is not representable in BMP(like some of the emojis), so it will take more(currently 2) chars.

Milone answered 29/4, 2017 at 19:37 Comment(1)
Could you attach reference for Java-9 limiting String size to 1 GB from 2 GBCaloyer
C
-1

The heap part gets worse, my friends. UTF-16 isn't guaranteed to be limited to 16 bits and can expand to 32

Carboni answered 5/8, 2012 at 15:22 Comment(2)
Except Java's char type is 16 bits exactly, so the number of bits UTF-16 uses doesn't really matter...Sylviasylviculture
@awksp: char is 16 bits, but a character in a String can occupy two char's (two 'surrogate' code elements to represent one character in UTF16). However, the Q was only for decimal digits and these are not only BMP but 8859-1/block0 and ASCII.Epilate

© 2022 - 2024 — McMap. All rights reserved.