Is it possible to convert a Pandas dataframe from/to an ORC file? I can transform the df in a parquet file, but the library doesn't seem to have ORC support. Is there an available solution in Python? If not, what could be the best strategy? One option could be converting the parquet file to ORC using an external tool, but I have no clue where to find it.
Convert Pandas dataframe from/to ORC file
Are you using Hive or Spark (or both)? It is much easier to do what you are trying to do if you have one of those, without errors. In particular, I strongly suggest you use Hive to manage your ORC files. You can connect to it in python by using pyodbc or pyhive packages. –
Geddes
@Reign I have just finished the ORC adapter in C++ and Python so it is possible to write ORC files now if you use my fork: github.com/mathyingzhou/arrow. –
Condescension
This answer is tested with pyarrow==4.0.1
and pandas==1.2.5
.
It first creates a pyarrow table using pyarrow.Table.from_pandas
. It then writes the orc file using pyarrow.orc.ORCFile
.
Read orc
import pandas as pd
import pyarrow.orc # This prevents: AttributeError: module 'pyarrow' has no attribute 'orc'
df = pd.read_orc('/tmp/your_df.orc')
Write orc
import pandas as pd
import pyarrow as pa
import pyarrow.orc as orc
# Here prepare your pandas df.
table = pa.Table.from_pandas(df, preserve_index=False)
orc.write_table(table, '/tmp/your_df.orc')
As of pandas==1.3.0
, there isn't a pd.to_orc
writer yet.
Do you have any idea if is possible to add compression type while writing ORC file using your described solution? –
Arella
To add to the answer above, Pandas v1.5.0 natively supports writing to ORC files. I'll update this with more documentation when it's released.
my_df.to_orc('myfile.orc')
I have used pyarrow recently which has ORC support, although I've seen a few issues where the pyarrow.orc module is not being loaded.
pip install pyarrow
to use:
import pandas as pd
import pyarrow.orc as orc
with open(filename) as file:
data = orc.ORCFile(file)
df = data.read().to_pandas()
© 2022 - 2024 — McMap. All rights reserved.