I'm currently writing some code to convert an arbitrary data structure to Apache Arrow vectors and got stuck on something relatively simple, namely, how to write a byte[]
to a ListVector
.
When writing data to a ListVector
through a BaseWriter.ListWriter
, primitive types can be added very easily – ie, writer.integer().writeInt(i)
or writer.float4().writeFloat4(f)
.
However, for variable length types like bytes (or strings), one is only left with methods with signatures similar to:
public void write(VarBinaryHolder h);
public void writeVarBinary(int start, int end, ArrowBuf buffer);
With VarBinaryHolder
being a simple generated wrapper class for an ArrowBuf
that does not even have a constructor.
I was expecting something similar to what the VarBinaryVector
offers, which has a Mutator
with a setSafe(int index, byte[] bytes)
method that does exactly what is expected.
Furthermore, there seems to be no straightforward way to wrap a byte array into an ArrowBuf
, the only one I see being to write data to a fresh VarBinaryVector
and fetching the underlying ArrowBuf
afterwards.
So, my interrogations are:
- Is it just the API missing a method, or should I not even be using a list vector to store lists of bytes1?
- Is there another obvious way to do this that I missed?
1 A VarBinaryVector
would do for simple cases, but I want to be able to nest lists as well. Also, the list's ability to contain multiple types can be useful as well.