How can we store a hash table in Apache Arrow?
Asked Answered
S

1

6

I am pretty new to Apache Arrow so this question may be ignorant. Apache Arrow provides the capability to store data structures like primitive types/struct/array in standardised memory format, I wonder if it is possible to store more complex data structures like hash table (or balanced search tree) with Apache Arrow?

Many algorithms relies on these data structures to work, do Apache Arrow users need to convert arrow data into language specific data structure in this case?

Showalter answered 16/12, 2019 at 2:9 Comment(2)
Would something like this help? arrow.apache.org/docs/java/org/apache/arrow/vector/util/…Photosensitive
@DavidAirapetyan looks relevant, but there is not too much documentation on this class and I am not sure how it works internally. At the same time, this seems to be Java specific and does it mean instead of a standardized memory format for hash table, each arrow language binding needs to create its own implementation for storing complex data structures like hash table?Showalter
A
3

You can certainly define a static/immutable hash table backed by the Arrow columnar format (e.g. if you want to be able to memory map an on-disk hash table). You have to decide what is the "schema" of the hash table, for example it could be

is_filled: boolean
key: KeyType
value: ValueType

This presumes that the hash and comparison functions are known and constant to the application based on the key type.

If you want the keys and values to be next to each other in memory then you could encode them in a binary type

is_filled: boolean
keyvalue: binary

The actual implementation of the hash table is up to you. You're welcome to contribute such code to the Apache Arrow codebase itself.

Affray answered 16/12, 2019 at 16:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.