How to write a simple, unwrapped, byte array to an Apache-Arrow ListWriter
Asked Answered
M

0

6

I'm currently writing some code to convert an arbitrary data structure to Apache Arrow vectors and got stuck on something relatively simple, namely, how to write a byte[] to a ListVector.

When writing data to a ListVector through a BaseWriter.ListWriter, primitive types can be added very easily – ie, writer.integer().writeInt(i) or writer.float4().writeFloat4(f).

However, for variable length types like bytes (or strings), one is only left with methods with signatures similar to:

public void write(VarBinaryHolder h);

public void writeVarBinary(int start, int end, ArrowBuf buffer);

With VarBinaryHolder being a simple generated wrapper class for an ArrowBuf that does not even have a constructor.

I was expecting something similar to what the VarBinaryVector offers, which has a Mutator with a setSafe(int index, byte[] bytes) method that does exactly what is expected.

Furthermore, there seems to be no straightforward way to wrap a byte array into an ArrowBuf, the only one I see being to write data to a fresh VarBinaryVector and fetching the underlying ArrowBuf afterwards.

So, my interrogations are:

  • Is it just the API missing a method, or should I not even be using a list vector to store lists of bytes1?
  • Is there another obvious way to do this that I missed?

1 A VarBinaryVector would do for simple cases, but I want to be able to nest lists as well. Also, the list's ability to contain multiple types can be useful as well.

Macassar answered 30/10, 2017 at 8:3 Comment(2)
Did you find an answer for this?Aeronautics
No, I probably found a workaround back then (would not be able to remember, though >.>)Macassar

© 2022 - 2024 — McMap. All rights reserved.