The answer by Eugene explains the reason of why you are observing such an increase in memory consumption for a large number of arrays. The question in the title, "How to efficiently store small byte arrays in Java?", may then be answered with: Not at all. 1
However, there probably are ways to achieve your goals. As usual, the "best" solution here will depend on how this data is going to be used. A very pragmatic approach would be: Define an interface
for your data structure.
In the simplest case, this interface could just be
interface ByteArray2D
{
int getNumRows();
int getNumColumns();
byte get(int r, int c);
void set(int r, int c, byte b);
}
Offering a basic abstraction of a "2D byte array". Depending on the application case, it may be beneficial to offer additional methods here. The patterns that could be employed here are frequently relevant for Matrix libraries, which handle "2D matrices" (usually of float
values), and they often offer methods like these:
interface Matrix {
Vector getRow(int row);
Vector getColumn(int column);
...
}
However, when the main purpose here is to handle a set of byte[]
arrays, methods for accessing each array (that is, each row of the 2D array) could be sufficient:
ByteBuffer getRow(int row);
Given this interface, it is simple to create different implementations. For example, you could create a simple implementation that just stores a 2D byte[][]
array internally:
class SimpleByteArray2D implements ByteArray2D
{
private final byte array[][];
...
}
Alternatively, you could create an implementation that stores a 1D byte[]
array, or analogously, a ByteBuffer
internally:
class CompactByteArray2D implements ByteArray2D
{
private final ByteBuffer buffer;
...
}
This implementation then just has to compute the (1D) index when calling one of the methods for accessing a certain row/column of the 2D array.
Below you will find a MCVE that shows this interface and the two implementations, the basic usage of the interface, and that does a memory footprint analysis using JOL.
The output of this program is:
For 10 rows and 1000 columns:
Total size for SimpleByteArray2D : 10240
Total size for CompactByteArray2D: 10088
For 100 rows and 100 columns:
Total size for SimpleByteArray2D : 12440
Total size for CompactByteArray2D: 10088
For 1000 rows and 10 columns:
Total size for SimpleByteArray2D : 36040
Total size for CompactByteArray2D: 10088
Showing that
the SimpleByteArray2D
implementation that is based on a simple 2D byte[][]
array requires more memory when the number of rows increases (even if the total size of the array remains constant)
the memory consumption of the CompactByteArray2D
is independent of the structure of the array
The whole program:
package stackoverflow;
import java.nio.ByteBuffer;
import org.openjdk.jol.info.GraphLayout;
public class EfficientByteArrayStorage
{
public static void main(String[] args)
{
showExampleUsage();
anaylyzeMemoryFootprint();
}
private static void anaylyzeMemoryFootprint()
{
testMemoryFootprint(10, 1000);
testMemoryFootprint(100, 100);
testMemoryFootprint(1000, 10);
}
private static void testMemoryFootprint(int rows, int cols)
{
System.out.println("For " + rows + " rows and " + cols + " columns:");
ByteArray2D b0 = new SimpleByteArray2D(rows, cols);
GraphLayout g0 = GraphLayout.parseInstance(b0);
System.out.println("Total size for SimpleByteArray2D : " + g0.totalSize());
//System.out.println(g0.toFootprint());
ByteArray2D b1 = new CompactByteArray2D(rows, cols);
GraphLayout g1 = GraphLayout.parseInstance(b1);
System.out.println("Total size for CompactByteArray2D: " + g1.totalSize());
//System.out.println(g1.toFootprint());
}
// Shows an example of how to use the different implementations
private static void showExampleUsage()
{
System.out.println("Using a SimpleByteArray2D");
ByteArray2D b0 = new SimpleByteArray2D(10, 10);
exampleUsage(b0);
System.out.println("Using a CompactByteArray2D");
ByteArray2D b1 = new CompactByteArray2D(10, 10);
exampleUsage(b1);
}
private static void exampleUsage(ByteArray2D byteArray2D)
{
// Reading elements of the array
System.out.println(byteArray2D.get(2, 4));
// Writing elements of the array
byteArray2D.set(2, 4, (byte)123);
System.out.println(byteArray2D.get(2, 4));
// Bulk access to rows
ByteBuffer row = byteArray2D.getRow(2);
for (int c = 0; c < row.capacity(); c++)
{
System.out.println(row.get(c));
}
// (Commented out for this MCVE: Writing one row to a file)
/*/
try (FileChannel fileChannel =
new FileOutputStream(new File("example.dat")).getChannel())
{
fileChannel.write(byteArray2D.getRow(2));
}
catch (IOException e)
{
e.printStackTrace();
}
//*/
}
}
interface ByteArray2D
{
int getNumRows();
int getNumColumns();
byte get(int r, int c);
void set(int r, int c, byte b);
// Bulk access to rows, for convenience and efficiency
ByteBuffer getRow(int row);
}
class SimpleByteArray2D implements ByteArray2D
{
private final int rows;
private final int cols;
private final byte array[][];
public SimpleByteArray2D(int rows, int cols)
{
this.rows = rows;
this.cols = cols;
this.array = new byte[rows][cols];
}
@Override
public int getNumRows()
{
return rows;
}
@Override
public int getNumColumns()
{
return cols;
}
@Override
public byte get(int r, int c)
{
return array[r][c];
}
@Override
public void set(int r, int c, byte b)
{
array[r][c] = b;
}
@Override
public ByteBuffer getRow(int row)
{
return ByteBuffer.wrap(array[row]);
}
}
class CompactByteArray2D implements ByteArray2D
{
private final int rows;
private final int cols;
private final ByteBuffer buffer;
public CompactByteArray2D(int rows, int cols)
{
this.rows = rows;
this.cols = cols;
this.buffer = ByteBuffer.allocate(rows * cols);
}
@Override
public int getNumRows()
{
return rows;
}
@Override
public int getNumColumns()
{
return cols;
}
@Override
public byte get(int r, int c)
{
return buffer.get(r * cols + c);
}
@Override
public void set(int r, int c, byte b)
{
buffer.put(r * cols + c, b);
}
@Override
public ByteBuffer getRow(int row)
{
ByteBuffer r = buffer.slice();
r.position(row * cols);
r.limit(row * cols + cols);
return r.slice();
}
}
Again, this is mainly intended as a sketch, to show one possible approach. The details of the interface will depend on the intended application pattern.
1 A side note:
The problem of the memory overhead is similar in other languages. For example, in C/C++, the structure that most closely resembles a "2D Java array" would be an array of manually allocated pointers:
char** array;
array = new (char*)[numRows];
array[0] = new char[numCols];
...
In this case, you also have an overhead that is proportional to the number of rows - namely, one (usually 4 byte) pointer for each row.
public final
fieldlength
, which contains the number of components of the array.length
may be positive or zero. Every array has a minimum overhead of 4-bytes (plus whatever overhead it inherits fromjava.lang.Object
). That's a minimum 40% penalty (per array) for arrays of 10 bytes. What exactly are you trying to achieve? – DominationByteBuffer
, even more specifically atMappedByteBuffer
. – Dominationinterface Data { byte get(int x, int y); void set(int x, int y, byte b)
. You can then store everything in a single array. If it is more convenient, you can also return "slices" of this large array in form of aByteBuffer
(using theByteBuffer#slice
method). – Burnish