How to set/get pandas.DataFrame to/from Redis?
Asked Answered
U

6

43

After setting a DataFrame to redis, then getting it back, redis returns a string and I can't figure out a way to convert this str to a DataFrame.

How can I do these two appropriately?

Upper answered 21/6, 2016 at 11:56 Comment(1)
use serialization before putting to Redis and deserialization when read from Redis.Guardant
U
60

set:

redisConn.set("key", df.to_msgpack(compress='zlib'))

get:

pd.read_msgpack(redisConn.get("key"))
Upper answered 22/6, 2016 at 2:20 Comment(2)
As of pandas 0.25.1, to_msgpack is deprecated in favor of pyarrow. Check this SO post for a full example of pandas + pyarrow + redisTrumpery
pyarrow is deprecating serialization/deserialization in 2.0.0 arrow.apache.org/blog/2020/10/22/2.0.0-releasePack
L
9

I couldn't use msgpack because of Decimal objects in my dataframe. Instead I combined pickle and zlib together like this, assuming a dataframe df and a local instance of Redis:

import pickle
import redis
import zlib

EXPIRATION_SECONDS = 600

r = redis.StrictRedis(host='localhost', port=6379, db=0)

# Set
r.setex("key", EXPIRATION_SECONDS, zlib.compress( pickle.dumps(df)))

# Get
rehydrated_df = pickle.loads(zlib.decompress(r.get("key")))

There isn't anything dataframe specific about this.

Caveats

  • the other answer using msgpack is better -- use it if it works for you
  • pickling can be dangerous -- your Redis server needs to be secure or you're asking for trouble
Lombardy answered 15/2, 2018 at 22:27 Comment(0)
W
7

to_msgpack is not available at the last versions of Pandas.

import redis
import pandas as pd

# Create a redis client
redisClient = redis.StrictRedis(host='localhost', port=6379, db=0)
# Create un dataframe
dd = {'ID': ['H576','H577','H578','H600', 'H700'],
  'CD': ['AAAAAAA', 'BBBBB', 'CCCCCC','DDDDDD', 'EEEEEEE']}
df = pd.DataFrame(dd)
data = df.to_json()
redisClient.set('dd', data)
# Retrieve the data
blob = redisClient.get('dd')
df_from_redis = pd.read_json(blob)
df_from_redis.head()

output

Whitebook answered 18/4, 2021 at 20:56 Comment(0)
H
5

For caching a dataframe use this.

import pyarrow as pa

def cache_df(alias,df):

    pool = redis.ConnectionPool(host='host', port='port', db='db')
    cur = redis.Redis(connection_pool=pool)
    context = pa.default_serialization_context()
    df_compressed =  context.serialize(df).to_buffer().to_pybytes()

    res = cur.set(alias,df_compressed)
    if res == True:
        print('df cached')

For fetching the cached dataframe use this.

def get_cached_df(alias):

    pool = redis.ConnectionPool(host='host',port='port', db='db') 
    cur = redis.Redis(connection_pool=pool)
    context = pa.default_serialization_context()
    all_keys = [key.decode("utf-8") for key in cur.keys()]

    if alias in all_keys:   
        result = cur.get(alias)

        dataframe = pd.DataFrame.from_dict(context.deserialize(result))

        return dataframe

    return None
Hintz answered 30/4, 2020 at 8:55 Comment(0)
H
2
import pandas as pd
df = pd.DataFrame([1,2])
redis.setex('df',100,df.to_json())
df = redis.get('df')
df = pd.read_json(df)
Hermie answered 12/6, 2020 at 22:12 Comment(1)
Remember to offer an explanation, and not just code. It's important to help readers understand why your code works, not just what to do. This is especially important when answering old questions with established answers—in this case, an accepted answer from nearly four years ago with quite a few votes. What value does your approach offer beyond that suggestion? Are you using new techniques that are faster, cleaner, or more reliable?Entrain
S
0

It's 2021, which means df.to_msgpack() is deprecated AND pyarrow has deprecated their custom serialization functionality as of pyarrow 2.0. (see the "Arbitrary Object Serialization" section on pyarrow's serialization page

That leaves good & trusty msgpack to serialize objects such that they can be pushed/stored into redis.

import msgpack
import redis 

# ...Writing to redis (already have data & a redis connection client)
redis_client.set('data_key_name', msgpack.packb(data))

# ...Retrieving from redis
retrieved_data = msgpack.unpackb(redis_client.get('data_key_name'))


Screed answered 20/10, 2021 at 17:46 Comment(2)
And what is "data" in your example?Wampumpeag
this does not work assuming data = pandas dataframe, maybe clarify what you meanGrieve

© 2022 - 2024 — McMap. All rights reserved.