How to use categorical data type with pyarrow dtypes?
Asked Answered
D

1

5

I'm working with the arrow dtypes with pandas, and my dataframe has a variable that should be categorical, but I can't figure out how to transform it into pyarrow data type for categorical data (dictionary)

According to pandas (https://arrow.apache.org/docs/python/pandas.html#pandas-arrow-conversion), the arrow data type I should be using is dictionary.

Usually, if you want pandas to use a pyarrow dtype you just add[pyarrow] to the name of the pyarrow type, for example dtype='string[pyarrow]'. I tried using dtype='dictionary[pyarrow]', but that yields the error:

data type 'dictionary[pyarrow]' not understood

I also tried 'categorical[pyarrow]', or 'category[pyarrow]', pyarrow.dictionary, pyarrow.dictionary(pyarrow.int16(),pyarrow.string()), and they didn't work either.

How can i use dictionary dtype on a pandas series? pd.Series(['Chocolate','Candy','Waffles'], dtype='what_to_put_here????')

Donnelly answered 10/5, 2023 at 19:34 Comment(0)
A
6

I believe pd.ArrowDtype is required:

dtype=pd.ArrowDtype(pa.dictionary(pa.int16(), pa.string()))
Allometry answered 10/5, 2023 at 19:57 Comment(2)
It's been a year and a half since this answer. I'm curious whether there is a newer way. Is there?Colloid
@Colloid I haven't worked with pyarrow recently enough to tell you either way.Allometry

© 2022 - 2024 — McMap. All rights reserved.