CSV to Feather in Pandas with slicing Rows
Asked Answered
H

2

20

I am processing a huge dataset (50 million rows) in CSV. I am trying to slice it and save it as Feather Format in order to save some memory while loading the feather format later.

As a workaround, I loaded the data in chunks as CSV file and later merged it into one data frame.

This is what I have tried so far:

df[2000000:4000000].to_feather('name')

I have got the following error:

ValueError: feather does not support serializing a non-default index for the index; you can .reset_index() to make the index into column(s)

Then I tried to reset the index but still, I get the same error.

Hi answered 6/9, 2018 at 19:3 Comment(3)
when you reset the index did you add the inplace=True argument? You do not actually change your df by doing df.reset_index()Visualize
I had the same problem and a reset index fixed it, but as d_kennetz says, you have to either do it in place or assign the result back to your data frame.Castro
This seems to be like a bug, I would suggest reporting it on github at github.com/wesm/feather/issuesSherlock
A
5

Try with .loc :

df.loc[2000000:4000000].reset_index().to_feather("./myfeather.ftr")

You'll have to reset the indexes to save the datataframe to feather format. Works for me.

Alnico answered 9/3, 2021 at 15:15 Comment(1)
If you don't need to store index you may want to: reset_index(drop=True)Kapoor
D
0

Save the required slice of the data to CSV df.to_csv(), load the data again from the CSV and then save to feather format. This method worked for me

Diorama answered 7/2, 2020 at 21:34 Comment(1)
This seems like a roundabout way to do df.reset_index(), it will also take some time for very large dataframes which feather is trying to avoidKerwon

© 2022 - 2024 — McMap. All rights reserved.