Convert Pandas DataFrame to & from In-Memory Feather
Asked Answered
W

1

7

Using the IO tools in pandas it is possible to convert a DataFrame to an in-memory feather buffer:

import pandas as pd  
from io import BytesIO 

df = pd.DataFrame({'a': [1,2], 'b': [3.0,4.0]})  

buf = BytesIO()

df.to_feather(buf)

However, using the same buffer to convert back to a DataFrame

pd.read_feather(buf)

Results in an error:

ArrowInvalid: Not a feather file

How can a DataFrame be convert to an in-memory feather representation and, correspondingly, back to a DataFrame?

Thank you in advance for your consideration and response.

Whitebeam answered 8/6, 2018 at 13:31 Comment(8)
@EdChum The documentation explicitly named the variable as path which would indicate it was purposeful since all of the other methods name the variable filepath_or_buffer.Lithoid
Hmm, could you try buf = io.BytesIO()Pardew
@Pardew That seems to have worked!Lithoid
Looking at the impl it accepts a file path, so it will also accept a file like object so I tried buf = io.BytesIO() but I don't have feather-format library installed so just waiting for pip to complete before confirmingPardew
This does seem to work but I'm not familiar with feather files so can't confirm if all is OKPardew
@Pardew I tried to verify by converting the feather back to a dataframe but got another error. Updated the question accordingly.Lithoid
I get the same problem I've not investigated how to convert the bytes object to a file like object so that pandas can read it again.Pardew
I think this maybe something to ask on github as it maybe functionality that could be addedPardew
K
9

With pandas==0.25.2 this can be accomplished in the following way:

import pandas
import io
df = pandas.DataFrame(data={'a': [1, 2], 'b': [3.0, 4.0]})
buf = io.BytesIO()
df.to_feather(buf)
output = pandas.read_feather(buf)

Then a call to output.head(2) returns:

    a    b
 0  1  3.0
 1  2  4.0

Note that you could do the same with csv files, but would require you to use StringIO instead of BytesIO


If you have a DataFrame with multiple indexes, you may see an error like

ValueError: feather does not support serializing <class 'pandas.core.indexes.base.Index'> for the index; you can .reset_index()to make the index into column(s)

In which case you need to call .reset_index() before to_feather, and call .set_index([...]) after read_feather


Last thing I would like to add, is that if you are doing something with the BytesIO, you need to seek back to 0 after writing the feather bytes. For example:

buffer = io.BytesIO()
df.reset_index(drop=False).to_feather(buffer)
buffer.seek(0)
s3_client.put_object(Body=buffer, Bucket='bucket', Key='file')
Karney answered 21/11, 2019 at 17:5 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.