Java array with more than 4gb elements
Asked Answered
L

11

24

I have a big file, it's expected to be around 12 GB. I want to load it all into memory on a beefy 64-bit machine with 16 GB RAM, but I think Java does not support byte arrays that big:

File f = new File(file);
long size = f.length();
byte data[] = new byte[size]; // <- does not compile, not even on 64bit JVM

Is it possible with Java?

The compile error from the Eclipse compiler is:

Type mismatch: cannot convert from long to int

javac gives:

possible loss of precision
found   : long
required: int
         byte data[] = new byte[size];
License answered 18/5, 2009 at 15:26 Comment(5)
Just curious: Why do you need to keep that much data in memory at the same time? Wouldn't it be possible to split that into chunks?Nonalcoholic
+1 to bruno's comment. The only way that having the entire file in memory will be a benefit is if you need to make random accesses into different points of the file, and in this case you'd almost certainly be better parsing it into a more computable representationSuprarenal
I am going to try to use a prefix tree (trie) to keep the data, this may shrink it enough to fit into 2gb of memory.License
possible duplicate of converting 'int' to 'long' or accessing too long array with 'long'Siloxane
Whow. Very frustated. Java must solve this in next 5 years.Wesson
V
22

Java array indices are of type int (4 bytes or 32 bits), so I'm afraid you're limited to 231 − 1 or 2147483647 slots in your array. I'd read the data into another data structure, like a 2D array.

Vinylidene answered 18/5, 2009 at 15:32 Comment(2)
@OmryYadan, The real limit will actually be less than 2147483647.Sadden
you mean MAX_INT - 8 ? github.com/omry/banana/blob/…License
A
15
package com.deans.rtl.util;

import java.io.FileInputStream;
import java.io.IOException;

/**
 * 
 * @author [email protected]
 *
 * Written to work with byte arrays requiring address space larger than 32 bits. 
 * 
 */

public class ByteArray64 {

    private final long CHUNK_SIZE = 1024*1024*1024; //1GiB

    long size;
    byte [][] data;

    public ByteArray64( long size ) {
        this.size = size;
        if( size == 0 ) {
            data = null;
        } else {
            int chunks = (int)(size/CHUNK_SIZE);
            int remainder = (int)(size - ((long)chunks)*CHUNK_SIZE);
            data = new byte[chunks+(remainder==0?0:1)][];
            for( int idx=chunks; --idx>=0; ) {
                data[idx] = new byte[(int)CHUNK_SIZE];
            }
            if( remainder != 0 ) {
                data[chunks] = new byte[remainder];
            }
        }
    }
    public byte get( long index ) {
        if( index<0 || index>=size ) {
            throw new IndexOutOfBoundsException("Error attempting to access data element "+index+".  Array is "+size+" elements long.");
        }
        int chunk = (int)(index/CHUNK_SIZE);
        int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
        return data[chunk][offset];
    }
    public void set( long index, byte b ) {
        if( index<0 || index>=size ) {
            throw new IndexOutOfBoundsException("Error attempting to access data element "+index+".  Array is "+size+" elements long.");
        }
        int chunk = (int)(index/CHUNK_SIZE);
        int offset = (int)(index - (((long)chunk)*CHUNK_SIZE));
        data[chunk][offset] = b;
    }
    /**
     * Simulates a single read which fills the entire array via several smaller reads.
     * 
     * @param fileInputStream
     * @throws IOException
     */
    public void read( FileInputStream fileInputStream ) throws IOException {
        if( size == 0 ) {
            return;
        }
        for( int idx=0; idx<data.length; idx++ ) {
            if( fileInputStream.read( data[idx] ) != data[idx].length ) {
                throw new IOException("short read");
            }
        }
    }
    public long size() {
        return size;
    }
}
}
Asleep answered 3/4, 2011 at 21:38 Comment(2)
A good idea to implement your own ByteArray for solving this case. If it wasn't for your answer I probably wouldn't have thought of doing so.Lanoralanose
Anybody care to add an update(byte[] b, int start, int size) method? :)Garland
C
7

If necessary, you can load the data into an array of arrays, which will give you a maximum of int.maxValue squared bytes, more than even the beefiest machine would hold well in memory.

Conferva answered 18/5, 2009 at 15:32 Comment(2)
that would be my next step. since I intend to do a binary search on the data, it will uglify the code, but I`m afraid there is no choice.License
You could make a class that manages an array of arrays but provides an abstraction similar to a regular array, e.g, with get and set that take a long index.Alvita
W
4

You might consider using FileChannel and MappedByteBuffer to memory map the file,

FileChannel fCh = new RandomAccessFile(file,"rw").getChannel();
long size = fCh.size();
ByteBuffer map = fCh.map(FileChannel.MapMode.READ_WRITE, 0, fileSize);

Edit:

Ok, I'm an idiot it looks like ByteBuffer only takes a 32-bit index as well which is odd since the size parameter to FileChannel.map is a long... But if you decide to break up the file into multiple 2Gb chunks for loading I'd still recommend memory mapped IO as there can be pretty large performance benefits. You're basically moving all IO responsibility to the OS kernel.

Written answered 18/5, 2009 at 16:30 Comment(3)
I also hit the same limitation of ByteBuffer which I think should be able to deal with long offsets and indexes at least at interface level. Concrete implementation should check ranges explicitly. Unfortunately it is not possible to map more then 2GB file into memory.Nib
Upvote as this is the right way to go, even if you have to partition the data into 2G chunks - wrap the chunks in a class which indexes with a long if you like.Leonoraleonore
MappedByteBuffer is also capped at 2GB, practically useless. See nyeggen.com/post/… for a solution which calls internal JNI methods to workaround this.Tiptop
O
2

I suggest you define some "block" objects, each of which holds (say) 1Gb in an array, then make an array of those.

Osman answered 18/5, 2009 at 15:32 Comment(0)
H
2

No, arrays are indexed by ints (except some versions of JavaCard that use shorts). You will need to slice it up into smaller arrays, probably wrapping in a type that gives you get(long), set(long,byte), etc. With sections of data that large, you might want to map the file use java.nio.

Hydroelectric answered 18/5, 2009 at 15:34 Comment(0)
B
2

don't limit your self with Integer.MAX_VALUE

although this question has been asked many years ago, but a i wanted to participate with a simple example using only java se without any external libraries

at first let's say it's theoretically impossible but practically possible

a new look : if the array is an object of elements what about having an object that is array of arrays

here's the example

import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.List;

/**
*
* @author Anosa
*/
 public class BigArray<t>{

private final static int ARRAY_LENGTH = 1000000;

public final long length;
private List<t[]> arrays;

public BigArray(long length, Class<t> glasss)
{
    this.length = length;
    arrays = new ArrayList<>();
    setupInnerArrays(glasss);

}

private void setupInnerArrays(Class<t> glasss)
{
    long numberOfArrays = length / ARRAY_LENGTH;
    long remender = length % ARRAY_LENGTH;
    /*
        we can use java 8 lambdas and streams:
        LongStream.range(0, numberOfArrays).
                        forEach(i ->
                        {
                            arrays.add((t[]) Array.newInstance(glasss, ARRAY_LENGTH));
                        });
     */

    for (int i = 0; i < numberOfArrays; i++)
    {
        arrays.add((t[]) Array.newInstance(glasss, ARRAY_LENGTH));
    }
    if (remender > 0)
    {
        //the remainer will 100% be less than the [ARRAY_LENGTH which is int ] so
        //no worries of casting (:
        arrays.add((t[]) Array.newInstance(glasss, (int) remender));
    }
}

public void put(t value, long index)
{
    if (index >= length || index < 0)
    {
        throw new IndexOutOfBoundsException("out of the reange of the array, your index must be in this range [0, " + length + "]");
    }
    int indexOfArray = (int) (index / ARRAY_LENGTH);
    int indexInArray = (int) (index - (indexOfArray * ARRAY_LENGTH));
    arrays.get(indexOfArray)[indexInArray] = value;

}

public t get(long index)
{
    if (index >= length || index < 0)
    {
        throw new IndexOutOfBoundsException("out of the reange of the array, your index must be in this range [0, " + length + "]");
    }
    int indexOfArray = (int) (index / ARRAY_LENGTH);
    int indexInArray = (int) (index - (indexOfArray * ARRAY_LENGTH));
    return arrays.get(indexOfArray)[indexInArray];
}

}

and here's the test

public static void main(String[] args)
{
    long length = 60085147514l;
    BigArray<String> array = new BigArray<>(length, String.class);
    array.put("peace be upon you", 1);
    array.put("yes it worj", 1755);
    String text = array.get(1755);
    System.out.println(text + "  i am a string comming from an array ");

}

this code is only limited by only Long.MAX_VALUE and Java heap but you can exceed it as you want (I made it 3800 MB)

i hope this is useful and provide a simple answer

Behoof answered 23/1, 2017 at 9:36 Comment(2)
since then I wrote Banana : github.com/omry/banana , a lib that lets you do that among other things.License
@OmryYadan good work i have a look on some examples good bro (:-Behoof
C
1

Java arrays use integers for their indices. As a result, the maximum array size is Integer.MAX_VALUE.

(Unfortunately, I can't find any proof from Sun themselves about this, but there are plenty of discussions on their forums about it already.)

I think the best solution you could do in the meantime would be to make a 2D array, i.e.:

byte[][] data;
Conduction answered 18/5, 2009 at 15:33 Comment(0)
S
1

As others have said, all Java arrays of all types are indexed by int, and so can be of max size 231 − 1, or 2147483647 elements (~2 billion). This is specified by the Java Language Specification so switching to another operating system or Java Virtual Machine won't help.

If you wanted to write a class to overcome this as suggested above you could, which could use an array of arrays (for a lot of flexibility) or change types (a long is 8 bytes so a long[] can be 8 times bigger than a byte[]).

Syllabism answered 18/5, 2009 at 15:42 Comment(0)
K
1

java doesn't support direct array with more than 2^32 elements presently,

hope to see this feature of java in future

Kalil answered 5/4, 2011 at 9:26 Comment(1)
No, the limit is 2^31 − 1 elements. And your second line does not cite any references.Ragucci
D
1

I think the idea of memory-mapping the file (using the CPU's virtual memory hardware) is the right approach. Except that MappedByteBuffer has the same limitation of 2Gb as native arrays. This guy claims to have solved the problem with a pretty simple alternative to MappedByteBuffer:

http://nyeggen.com/post/2014-05-18-memory-mapping-%3E2gb-of-data-in-java/

https://gist.github.com/bnyeggen/c679a5ea6a68503ed19f#file-mmapper-java

Unfortunately the JVM crashes when you read beyond 500Mb.

Decadent answered 6/7, 2016 at 4:27 Comment(1)
While in this specific example my use case was to read a file, this is not the only use case for large arrays.License

© 2022 - 2024 — McMap. All rights reserved.