pyarrow - 2 - McMap

4

pyarrow.lib.ArrowInvalid: ('Could not convert X with type Y: did not recognize Python value type when inferring an Arrow data type')

Using pyarrow to convert a pandas.DataFrame containing Player objects to a pyarrow.Table with the following code import pandas as pd import pyarrow as pa class Player: def __init__(self, name, a...

python pandas parquet pyarrow fastparquet

Baskerville asked 7/1, 2020 at 22:7

2

Solved

Failed building wheel for pyarrow when installing Apache Superset

I am trying to pip install Superset pip install apache-superset and getting below error Traceback (most recent call last): File "c:\users\saurav_nimesh\appdata\local\programs\python\python3...

cmake pyarrow apache-superset

Coxcomb asked 27/2, 2020 at 16:41

1

Solved

How to use Apache Arrow IPC from multiple processes (possibly from different languages)?

I'm not sure where to begin, so looking for some guidance. I'm looking for a way to create some arrays/tables in one process, and have it accessible (read-only) from another. So I create a pyarrow....

python ipc pyarrow apache-arrow

Silverplate asked 8/2, 2023 at 23:34

5

Solved

Using pyarrow how do you append to parquet file?

How do you append/update to a parquet file with pyarrow? import pandas as pd import pyarrow as pa import pyarrow.parquet as pq table2 = pd.DataFrame({'one': [-1, np.nan, 2.5], 'two': ['foo', '...

python pandas parquet pyarrow

Marshy asked 4/11, 2017 at 17:59

3

How to open huge parquet file using Pandas without enough RAM

I am trying to read a decently large Parquet file (~2 GB with about ~30 million rows) into my Jupyter Notebook (in Python 3) using the Pandas read_parquet function. I have also installed the pyarro...

python pandas parquet pyarrow fastparquet

Antacid asked 11/2, 2020 at 3:59

5

Solved

How to update data in pyarrow table?

I have a python script that reads in a parquet file using pyarrow. I'm trying to loop through the table to update values in it. If I try this: for col_name in table2.column_names: if col_name in m...

python-3.x pyarrow

Undrape asked 22/1, 2021 at 13:1

3

How can I stream more data than will fit in memory from a PostgreSQL query to a parquet file?

I have the below code which queries a database of about 500k rows. and it throws a SIGKILL when it hits rows = cur.fetchall(). I've tried to iterate through the cursor rather than load it all up in...

python psycopg2 parquet pyarrow

Elonore asked 2/9, 2020 at 20:38

2

Create Parquet files from stream in python in memory-efficient manner

It appears the most common way in Python to create Parquet files is to first create a Pandas dataframe and then use pyarrow to write the table to parquet. I worry that this might be overly taxing i...

python parquet pyarrow fastparquet

Unclasp asked 11/11, 2020 at 17:48

9

Solved

How to read a list of parquet files from S3 as a pandas dataframe using pyarrow?

I have a hacky way of achieving this using boto3 (1.4.4), pyarrow (0.4.1) and pandas (0.20.3). First, I can read a single parquet file locally like this: import pyarrow.parquet as pq path = 'par...

python pandas dataframe boto3 pyarrow

Anglicanism asked 11/7, 2017 at 20:1

5

pandas df.to_parquet write to multiple smaller files

Is it possible to use Pandas' DataFrame.to_parquet functionality to split writing into multiple files of some approximate desired size? I have a very large DataFrame (100M x 100), and am using df.t...

pandas save parquet pyarrow snappy

Cocker asked 6/9, 2020 at 20:33

3

Overwrite parquet file with pyarrow in S3

I'm trying to overwrite my parquet files with pyarrow that are in S3. I've seen the documentacion and I haven't found anything. Here is my code: from s3fs.core import S3FileSystem import pyarrow ...

python amazon-s3 pyarrow python-s3fs

Trotta asked 30/8, 2018 at 11:22

2

Solved

Use pyarrow in Glue pythonshell - ModuleNotFoundError: No module named 'pyarrow.lib'

Created a egg and whl file of pyarrow and put this on s3, for call this in pythonshell job. Received this message: Job code: import pyarrow raise Error, same structure for whl: Traceback (most...

python python-3.x aws-glue egg pyarrow

Minimus asked 3/3, 2020 at 17:47

1

Solved

How do I use awswrangler to read only the first few N rows of a parquet file stored in S3?

I am trying to use awswrangler to read into a pandas dataframe an arbitrarily-large parquet file stored in S3, but limiting my query to the first N rows due to the file's size (and my poor bandwidt...

pandas dataframe amazon-s3 pyarrow aws-data-wrangler

Honeysuckle asked 25/5, 2022 at 12:15

2

Solved

What are the differences between feather and parquet?

Both are columnar (disk-)storage formats for use in data analysis systems. Both are integrated within Apache Arrow (pyarrow package for python) and are designed to correspond with Arrow as a colum...

python pandas parquet feather pyarrow

Cheatham asked 3/1, 2018 at 18:48

5

Solved

How to read partitioned parquet files from S3 using pyarrow in python

I looking for ways to read data from multiple partitioned directories from s3 using python. data_folder/serial_number=1/cur_date=20-12-2012/abcdsd0324324.snappy.parquet data_folder/serial_number=2...

python parquet pyarrow fastparquet python-s3fs

Draughtsman asked 13/7, 2017 at 13:56

0

ArrowInvalid: GetFileInfo() yielded path which is outside base dir parquet

I have a parquet dataset stored in my S3 bucket with multiple partition files. I want to read it into my pandas dataframe, but am getting this ArrowInvalid error when I didn't before. Occasionally,...

python pandas parquet pyarrow

Emanuele asked 28/4, 2022 at 18:9

2

Solved

Can I store a Parquet file with a dictionary column having mixed types in their values?

I am trying to store a Python Pandas DataFrame as a Parquet file, but I am experiencing some issues. One of the columns of my Pandas DF contains dictionaries as such: import pandas as pandas df = ...

python pandas dataframe parquet pyarrow

Latanya asked 5/8, 2020 at 16:42

2

PyArrow: How to copy files from local to remote using new filesystem interface?

Could somebody give me a hint on how can I copy a file form a local filesystem to a HDFS filesystem using PyArrow's new filesystem interface (i.e. upload, copyFromLocal)? I have read the documentat...

python hdfs pyarrow apache-arrow

Ejective asked 28/7, 2021 at 11:11

1

Solved

Is it possible to append rows to an existing Arrow (PyArrow) Table?

I am aware that "Many Arrow objects are immutable: once constructed, their logical properties cannot change anymore" (docs). In this blog post by one of the Arrow creators it's said Tabl...

pyarrow apache-arrow

Performance asked 10/3, 2022 at 17:58

1

Solved

Pyarrow Write/Append Columns Arrow File

I have a calculator that iterates a couple of hundred object and produces Nx1 arrays for each of those objects. N here being 1-10m depending on configurations. Right now I am summing over these by ...

python pyarrow apache-arrow

Doretha asked 24/2, 2022 at 18:50

3

Solved

No module named 'pyarrow._orc'

I have a problem using pyarrow.orc module in Anaconda on Windows 10. import pyarrow.orc as orc throws an exception: Traceback (most recent call last): File "<stdin>", line 1, in <modu...

python anaconda conda pyarrow

Solvable asked 12/11, 2019 at 15:47

2

Solved

Problem running a Pandas UDF on a large dataset

I'm currently working on a project and I am having a hard time understanding how does the Pandas UDF in PySpark works. I have a Spark Cluster with one Master node with 8 cores and 64GB, along with...

python apache-spark pyspark pyarrow

Exclave asked 26/12, 2019 at 20:53

5

Solved

Python error using pyarrow - ArrowNotImplementedError: Support for codec 'snappy' not built

Using Python, Parquet, and Spark and running into ArrowNotImplementedError: Support for codec 'snappy' not built after upgrading to pyarrow=3.0.0. My previous version without this error was pyarrow...

parquet pyarrow apache-arrow

Squaw asked 2/2, 2021 at 21:19

2

Fastest way to write numpy array in arrow format

I'm looking for fast ways to store and retrieve numpy array using pyarrow. I'm pretty satisfied with retrieval. It takes less than 1 second to extract columns from my .arrow file that contains 1.00...

python numpy pyarrow

Nunatak asked 9/11, 2021 at 16:44

1

Solved

How to convert a pandas dataframe to a an arrow dataset?

In huggingface library, there is a particular format of datasets called arrow dataset https://arrow.apache.org/docs/python/dataset.html https://huggingface.co/datasets/wiki_lingua I have to convert...

pandas pyarrow

Lindly asked 8/11, 2021 at 4:20

pyarrow Questions

Recommended topics

Hot tags