Java: Efficiently converting an array of longs to an array of bytes
Asked Answered
S

3

2

I have an array of longs I want to write to disk. The most efficient disk I/O functions take in byte arrays, for example:

FileOutputStream.write(byte[] b, int offset, int length)

...so I want to begin by converting my long[] to byte[] (8 bytes for each long). I'm struggling to find a clean way to do this.

Direct typecasting doesn't seem allowed:

ConversionTest.java:6: inconvertible types
found   : long[]
required: byte[]
    byte[] byteArray = (byte[]) longArray;
                            ^

It's easy to do the conversion by iterating over the array, for example:

ByteBuffer bytes = ByteBuffer.allocate(longArray.length * (Long.SIZE/8));
for( long l: longArray )
{
    bytes.putLong( l );
}
byte[] byteArray = bytes.array();

...however that seems far less efficient than simply treating the long[] as a series of bytes.

Interestingly, when reading the file, it's easy to "cast" from byte[] to longs using Buffers:

LongBuffer longs = ByteBuffer.wrap(byteArray).asLongBuffer();

...but I can't seem to find any functionality to go the opposite direction.

I understand there are endian considerations when converting from long to byte, but I believe I've already addressed those: I'm using the Buffer framework shown above, which defaults to big endian, regardless of native byte order.

Standardbearer answered 28/4, 2015 at 18:32 Comment(4)
First of all, are you sure all of your longs can be converted to bytes? What is the largest value you're dealing with?Tatyanatau
@KevinWorkman I think he means convert to their actual byte comporents - 8 bytes for each long.Miltie
Yes, that's correct. Clarified post.Standardbearer
Do you want there to be no copying whatsoever?Miltie
Q
1

Concerning the efficiency, many details will, in fact, hardly make a difference. The hard disk is by far the slowest part involved here, and in the time that it takes to write a single byte to the disk, you could have converted thousands or even millions of bytes to longs. Every performance test here will not tell you anything about the performance of the implementation, but about the performance of the hard disk. In doubt, one should make dedicated benchmarks comparing the different conversion strategies, and comparing the different writing methods, respectively.

Assuming that the main goal is a functionality that allows a convenient conversion and does not impose an unnecessary overhead, I'd like to propose the following approach:

One can create a ByteBuffer of sufficient size, view this as a LongBuffer, use the bulk LongBuffer#put(long[]) method (which takes care of endianness conversions, of necessary, and does this as efficient as it can be), and finally, write the original ByteBuffer (which is now filled with the long values) to the file, using a FileChannel.

Following this idea, I think that this method is convenient and (most likely) rather efficient:

private static void bulkAndChannel(String fileName, long longArray[]) 
{
    ByteBuffer bytes = 
        ByteBuffer.allocate(longArray.length * Long.BYTES);
    bytes.order(ByteOrder.nativeOrder()).asLongBuffer().put(longArray);
    try (FileOutputStream fos = new FileOutputStream(fileName))
    {
        fos.getChannel().write(bytes);
    }
    catch (IOException e)
    {
        e.printStackTrace();
    }
}

(Of course, one could argue about whether allocating a "large" buffer is the best idea. But thanks to the convenience methods of the Buffer classes, this could easily and with reasonable effort be modified to write "chunks" of data with an appropriate size, for the case that one really wants to write a huge array and the memory overhead of creating the corresponding ByteBuffer would be prohibitively large)

Quiche answered 28/4, 2015 at 20:51 Comment(5)
This makes sense! Thanks for the elegant approach -- and the big picture about I/O performance.Standardbearer
Writing to a channel is not quite that simple. The write method does not guarantee to write the entire contents. So this must be checked and the write method may need to be called multiple times. docs.oracle.com/javase/8/docs/api/java/nio/channels/…Helping
@BrettOkken Are you sure? It says that it will attempt to write b.remaining() bytes, and will only return after writing all of the requested bytes. Maybe I misunderstood something here, in this case, I would edit the answer accordingly, but from my understanding, it should write the whole buffer in the current form...Quiche
The following statement, however, is " Some types of channels, depending upon their state, may write only some of the bytes or possibly none at all." The documentation on FileChannel does not exclude this as a possibility. Indeed the "return" documentation states "The number of bytes written, possibly zero."Helping
@BrettOkken It says "Unless otherwise specified, a write operation will return only after writing all of the r requested bytes", and I don't see it "otherwise specified" for FileChannel, except for the return value (which is copied from the superclass). I think this might be worth an own question: #29946185Quiche
H
2

No, there is not a trivial way to convert from a long[] to a byte[].

Your best option is likely to wrap your FileOutputStream with a BufferedOutputStream and then write out the individual byte values for each long (using bitwise operators).

Another option is to create a ByteBuffer and put your long values into the ByteBuffer and then write that to a FileChannel. This handles the endianness conversion for you, but makes the buffering more complicated.

Helping answered 28/4, 2015 at 18:47 Comment(0)
Q
1

Concerning the efficiency, many details will, in fact, hardly make a difference. The hard disk is by far the slowest part involved here, and in the time that it takes to write a single byte to the disk, you could have converted thousands or even millions of bytes to longs. Every performance test here will not tell you anything about the performance of the implementation, but about the performance of the hard disk. In doubt, one should make dedicated benchmarks comparing the different conversion strategies, and comparing the different writing methods, respectively.

Assuming that the main goal is a functionality that allows a convenient conversion and does not impose an unnecessary overhead, I'd like to propose the following approach:

One can create a ByteBuffer of sufficient size, view this as a LongBuffer, use the bulk LongBuffer#put(long[]) method (which takes care of endianness conversions, of necessary, and does this as efficient as it can be), and finally, write the original ByteBuffer (which is now filled with the long values) to the file, using a FileChannel.

Following this idea, I think that this method is convenient and (most likely) rather efficient:

private static void bulkAndChannel(String fileName, long longArray[]) 
{
    ByteBuffer bytes = 
        ByteBuffer.allocate(longArray.length * Long.BYTES);
    bytes.order(ByteOrder.nativeOrder()).asLongBuffer().put(longArray);
    try (FileOutputStream fos = new FileOutputStream(fileName))
    {
        fos.getChannel().write(bytes);
    }
    catch (IOException e)
    {
        e.printStackTrace();
    }
}

(Of course, one could argue about whether allocating a "large" buffer is the best idea. But thanks to the convenience methods of the Buffer classes, this could easily and with reasonable effort be modified to write "chunks" of data with an appropriate size, for the case that one really wants to write a huge array and the memory overhead of creating the corresponding ByteBuffer would be prohibitively large)

Quiche answered 28/4, 2015 at 20:51 Comment(5)
This makes sense! Thanks for the elegant approach -- and the big picture about I/O performance.Standardbearer
Writing to a channel is not quite that simple. The write method does not guarantee to write the entire contents. So this must be checked and the write method may need to be called multiple times. docs.oracle.com/javase/8/docs/api/java/nio/channels/…Helping
@BrettOkken Are you sure? It says that it will attempt to write b.remaining() bytes, and will only return after writing all of the requested bytes. Maybe I misunderstood something here, in this case, I would edit the answer accordingly, but from my understanding, it should write the whole buffer in the current form...Quiche
The following statement, however, is " Some types of channels, depending upon their state, may write only some of the bytes or possibly none at all." The documentation on FileChannel does not exclude this as a possibility. Indeed the "return" documentation states "The number of bytes written, possibly zero."Helping
@BrettOkken It says "Unless otherwise specified, a write operation will return only after writing all of the r requested bytes", and I don't see it "otherwise specified" for FileChannel, except for the return value (which is copied from the superclass). I think this might be worth an own question: #29946185Quiche
S
0

OP here.

I have thought of one approach: ByteBuffer.asLongBuffer() returns an instance of ByteBufferAsLongBufferB, a class which wraps ByteBuffer in an interface for treating the data as longs while properly managing endianness. I could extend ByteBufferAsLongBufferB, and add a method to return the raw byte buffer (which is protected).

But this seems so esoteric and convoluted I feel there must be an easier way. Either that, or something in my approach is flawed.

Standardbearer answered 28/4, 2015 at 18:33 Comment(1)
But you still have to copy your long[] into that LongBuffer.Miltie

© 2022 - 2024 — McMap. All rights reserved.