Read Parquet Files using Apache Arrow

I have some Parquet files that I've written in Python using PyArrow (Apache Arrow):

pyarrow.parquet.write_table(table, "example.parquet")

Now I want to read these files (and preferably get an Arrow Table) using a Java program.

In Python, I can simply use the following to get an Arrow Table from my Parquet file:

table = pyarrow.parquet.read_table("example.parquet")

Is there an equivalent and easy solution in Java?

I couldn't really find any good / working examples nor any usefull documentation for Java (only for Python). Or some examples don't provide all needed Maven dependencies. I also don't want to use a Hadoop file system, I just want to use local files.

Note: I also found out that I can't use "Apache Avro" because my Parquet files contains column names with the symbols [, ] and $ which are invalid characters in Apache Avro.

Also, can you please provide Maven dependencies if your solution uses Maven.

I am on Windows and using Eclipse.

Update (November 2020): I never found a suitable solution and just stuck with Python for my usecase.

Recommended topics

Hot tags