Writing dictionary of dataframes to file

O

3

9

I have a dictionary, and for each key in my dictionary, I have one pandas dataframe. The dataframes from key to key are of unequal length.

It takes some time to get to the dataframes that are connected to each key, and therefore I wish to save my dictionary of dataframes to a file, so I can just read the file into Python instead of running my script every time I open Python.

My question is: How would you suggest to write the dictionary with dataframes to a file - and to read it in again? I have tried the following, where dictex is the dictionary:

w = csv.writer(open("output.csv", "w"))
for key, val in dictex.items():
    w.writerow([key, val])

But I am not really sure if I get what I want, as I struggle to read the file into Python again.

Thank you for your time.

Outpouring answered 10/6, 2018 at 17:26 Comment(3)

Do these dataframes have the same set of columns? If so, you can just add an additional column to each dataframe and store a key there, then merge all dataframes and save the result into a file. – Ladner 10/6, 2018 at 17:36

But it would be fine if you added a minimal working example here. – Ladner 10/6, 2018 at 17:38

Nope the dataframes are of unequal dimensions (both unequal rows and/or unequal columns) – Outpouring 10/6, 2018 at 19:1

M

2

Regarding the rule of saving data frames independently and not using SQL solution (or another database format) it could be the following code:

import csv
import pandas as pd 

def saver(dictex):
    for key, val in dictex.items():
        val.to_csv("data_{}.csv".format(str(key)))

    with open("keys.txt", "w") as f: #saving keys to file
        f.write(str(list(dictex.keys())))

def loader():
    """Reading data from keys"""
    with open("keys.txt", "r") as f:
        keys = eval(f.read())

    dictex = {}    
    for key in keys:
        dictex[key] = pd.read_csv("data_{}.csv".format(str(key)))

    return dictex

(...)

dictex = loader()

Macintyre answered 10/6, 2018 at 18:51 Comment(1)

Hi @artona, thank you for your answer. It doesn't work with the dictionary I need to write to file, because its values are both dataframes and integers. I've posted my question here: #65098846, in case you can help :) – Thermography 1/12, 2020 at 20:55

L

5

You can use pickle to save a dictionary of dataframes in python.

import pickle

df1 = pd.DataFrame(data={'a':[1,2,3], 'b':[4,5,6]})
df2 = pd.DataFrame(data={'a':[5,5,5,5,5], 'b':[5,5,5,5,5]})

d = {}
d['df1'] = df1
d['df2'] = df2

with open('dict_of_dfs.pickle', 'wb') as f:
    pickle.dump(d, f)

Lactiferous answered 28/3, 2022 at 20:13 Comment(1)

this worked perfectly for me. according to digitalocean.com, "Python Pickle is used to serialize and deserialize a python object structure. Any object on python can be pickled so that it can be saved on disk." I tried some of the other answers, but because I had some datetime objects stored in the df's it made things complicated. Pickle worked easily on first try!! – Gallic 27/2, 2023 at 2:55

M

2

Regarding the rule of saving data frames independently and not using SQL solution (or another database format) it could be the following code:

import csv
import pandas as pd 

def saver(dictex):
    for key, val in dictex.items():
        val.to_csv("data_{}.csv".format(str(key)))

    with open("keys.txt", "w") as f: #saving keys to file
        f.write(str(list(dictex.keys())))

def loader():
    """Reading data from keys"""
    with open("keys.txt", "r") as f:
        keys = eval(f.read())

    dictex = {}    
    for key in keys:
        dictex[key] = pd.read_csv("data_{}.csv".format(str(key)))

    return dictex

(...)

dictex = loader()

Macintyre answered 10/6, 2018 at 18:51 Comment(1)

Hi @artona, thank you for your answer. It doesn't work with the dictionary I need to write to file, because its values are both dataframes and integers. I've posted my question here: #65098846, in case you can help :) – Thermography 1/12, 2020 at 20:55

I

0

I added some new attributes to the function that written there, in order to insert in a certain file, insert it with an index if you want to do it twice and delete the unwanted column when you save your dataframe.

def saver(dictex, type_, output):
    for key, val in dictex.items():
        val.to_csv("{output}/data_{}_{}.csv".format(str(key), str(type_)), index=False)

    with open(f"{output}/keys_{type_}.txt", "w") as f: #saving keys to file
        f.write(str(list(dictex.keys())))

def loader(type_, output='your_file'):
    """Reading data from keys"""
    with open(f"{output}/keys_{type_}.txt", "r") as f:
        keys = eval(f.read())

    dictex = {}    
    for key in keys:
        dictex[key] = pd.read_csv("{}/data_{}_{}.csv".format(output, str(key), str(type_)))

    return dictex

Invest answered 5/2 at 16:28 Comment(0)

Recommended topics

Hot tags