pyarrow Questions
2
I'm new to Python and Pandas - please be gentle!
I'm using SqlAlchemy with pymssql to execute a SQL query against a SQL Server database and then convert the result set into a dataframe. I'm then at...
7
I have tried installing it in the terminal and in juypter lab and it says that it has been successfully installed but when I run df = query_job.to_dataframe() I keep getting the error "
ValueE...
Spelter asked 13/12, 2020 at 13:3
7
Solved
I'm working on a script where I'm sending a dataframe to BigQuery:
load_job = bq_client.load_table_from_dataframe(
df, '.'.join([PROJECT, DATASET, PROGRAMS_TABLE])
)
# Wait for the load job to c...
Doughy asked 10/1, 2020 at 13:38
3
Solved
Does parquet allow appending to a parquet file periodically ?
How does appending relate to partitioning if any ? For example if i was able to identify a column that had low cardinality and partitio...
Clew asked 9/9, 2021 at 20:23
3
Solved
I have the following dataframe in pandas that is saved as a parquet
import pandas as pd
df = pd.DataFrame({"a":['1','2','3']}).astype("category")
Upon inspection of the only fi...
5
In the dataset I have 5000000 rows, I would like to add a column called 'embeddings' to my dataset.
dataset = dataset.add_column('embeddings', embeddings)
The variable embeddings is a numpy memmap ...
Confectioner asked 22/11, 2021 at 10:56
7
I am trying to run a simple pandas UDF example on my server. From here
I have created a fresh environment just for the purpose of running this code.
(PySparkEnv) $ conda list
# packages in envir...
Carlton asked 13/9, 2018 at 19:12
3
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0)
-everytime I execute on PyCharm "import pandas as pd" I got this error
Give me solutio...
6
Solved
After some searching I failed to find a thorough comparison of fastparquet and pyarrow.
I found this blog post (a basic comparison of speeds).
and a github discussion that claims that files crea...
Electrolyte asked 16/7, 2018 at 12:0
6
I'm trying to install the pyarrow on a master instance of my EMR cluster, however I'm always receiving this error.
[hadoop@ip-XXX-XXX-XXX-XXX ~]$ sudo /usr/bin/pip-3.4 install pyarrow
Collecting p...
Willman asked 5/9, 2018 at 9:12
1
I have the following pandas dataframe object using the pyarrow back end:
crsp_m.info(verbose = True)
out:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4921811 entries, 0 to 4921810
Data...
Friend asked 19/4, 2023 at 17:50
3
For the parsing of a larger file, I need to write in a loop to a large number of parquet files successively. However, it appears that the memory consumed by this task increases over each iteration,...
7
Solved
Using
dd = {'ID': ['H576','H577','H578','H600', 'H700'],
'CD': ['AAAAAAA', 'BBBBB', 'CCCCCC','DDDDDD', 'EEEEEEE']}
df = pd.DataFrame(dd)
Pre Pandas 0.25, this below worked.
set: redisConn.se...
4
I have a CSV stored in GCS which I want to load it to BigQuery table. But I need to do some pre-process first so I load it to DataFrame and later load to BigQuery table
import pandas as pd
import j...
Lobectomy asked 21/2, 2022 at 8:47
0
The pyarrow documentation builds a custom UUID type many times like this:
import pyarrow as pa
class UuidType(pa.PyExtensionType):
def __init__(self):
pa.PyExtensionType.__init__(self, pa.binary(...
Cigarillo asked 5/9, 2023 at 22:16
3
Solved
How can I directly connect MS SQL Server to polars?
The documentation does not list any supported connections but recommends the use of pandas.
Update:
SQL Server Authentication works per answer, b...
Dicotyledon asked 31/12, 2022 at 4:48
4
Solved
Trying to use a specific AWS profile when using Apache Pyarrow. The documentation show no option to pass a profile name when instantiating S3FileSystem using pyarrow fs [https://arrow.apache.org/do...
Ketcham asked 22/6, 2022 at 16:50
0
I am using pyarrow and am struggling to understand the big difference in memory usage between the Dataset.to_batches method compared to ParquetFile.iter_batches.
Using pyarrow.dataset
>>> ...
Ting asked 4/8, 2023 at 1:11
1
Solved
I am starting to work with the parquet file format.
The official Apache site recommends large row groups of 512MB to 1GB (here).
Several online source (e.g. this one) suggest that the default row g...
Romanfleuve asked 27/7, 2023 at 17:6
3
Solved
I know I can connect to an HDFS cluster via pyarrow using pyarrow.hdfs.connect()
I also know I can read a parquet file using pyarrow.parquet's read_table()
However, read_table() accepts a filepat...
1
Solved
I'm working with the arrow dtypes with pandas, and my dataframe has a variable that should be categorical, but I can't figure out how to transform it into pyarrow data type for categorical data (di...
4
I have feather format file sales.feather that I am using for exchanging data between python and R.
In R I use the following command:
df = arrow::read_feather("sales.feather", as_data_fra...
Constrained asked 1/12, 2018 at 9:49
1
Solved
Let's say I have the following dataframe:
Code
Price
AA1
10
AA1
20
BB2
30
And I want to perform the following operation on it:
df.groupby("code").aggregate({
"price&...
Tnt asked 3/4, 2023 at 9:6
2
I have a problem with filetypes when converting a parquet file to a dataframe.
I do
bucket = 's3://some_bucket/test/usages'
import pyarrow.parquet as pq
import s3fs
s3 = s3fs.S3FileSystem()
rea...
Urbannal asked 25/2, 2019 at 12:45
3
Solved
I am trying to use pypy3 to install pyarrow, but some errors occur.
Basic information is blow:
macOS 10.15.7
Xcode 12.3
python version 3.7.9
pypy3 version 7.3.3
pyarrow version 0.17.1
cmd is 'pip_...
1 Next >
© 2022 - 2024 — McMap. All rights reserved.