With Python, can I keep a persistent dictionary and modify it?
Asked Answered
X

9

15

So, I want to store a dictionary in a persistent file. Is there a way to use regular dictionary methods to add, print, or delete entries from the dictionary in that file?

It seems that I would be able to use cPickle to store the dictionary and load it, but I'm not sure where to take it from there.

Xenia answered 4/8, 2009 at 18:10 Comment(2)
When you read the pickle documentation, what questions did you have? Can you post some code to show what you have working and what you need help with?Salty
Basically, I would want to use the dictionary as a database type thing. So I could write the dictionary to a file, and then load the file in my script when I wanted to add something to the dictionary, but using regular dictionary methods. Is there a way I can just load the file, and then modify the dictionary with the typical dict["key"] = "items" or del dict["key"]? I've tried to do this now, and python tells me that dict is undefined in this particular example.Xenia
B
21

If your keys (not necessarily the values) are strings, the shelve standard library module does what you want pretty seamlessly.

Blockish answered 4/8, 2009 at 18:17 Comment(2)
Il piacere é tutto mio, Stefano!-)Blockish
It's worth noting that this does not work for edits made to elements contained in the dictionary. So if you have a shelve called data with a list of items then data['items'].append(123) will fail.Transvestite
A
14

Use JSON

Similar to Pete's answer, I like using JSON because it maps very well to python data structures and is very readable:

Persisting data is trivial:

>>> import json
>>> db = {'hello': 123, 'foo': [1,2,3,4,5,6], 'bar': {'a': 0, 'b':9}}
>>> fh = open("db.json", 'w')
>>> json.dump(db, fh)

and loading it is about the same:

>>> import json
>>> fh = open("db.json", 'r')
>>> db = json.load(fh)
>>> db
{'hello': 123, 'bar': {'a': 0, 'b': 9}, 'foo': [1, 2, 3, 4, 5, 6]}
>>> del new_db['foo'][3]
>>> new_db['foo']
[1, 2, 3, 5, 6]

In addition, JSON loading doesn't suffer from the same security issues that shelve and pickle do, although IIRC it is slower than pickle.

If you want to write on every operation:

If you want to save on every operation, you can subclass the Python dict object:

import os
import json

class DictPersistJSON(dict):
    def __init__(self, filename, *args, **kwargs):
        self.filename = filename
        self._load();
        self.update(*args, **kwargs)

    def _load(self):
        if os.path.isfile(self.filename) 
           and os.path.getsize(self.filename) > 0:
            with open(self.filename, 'r') as fh:
                self.update(json.load(fh))

    def _dump(self):
        with open(self.filename, 'w') as fh:
            json.dump(self, fh)

    def __getitem__(self, key):
        return dict.__getitem__(self, key)

    def __setitem__(self, key, val):
        dict.__setitem__(self, key, val)
        self._dump()

    def __repr__(self):
        dictrepr = dict.__repr__(self)
        return '%s(%s)' % (type(self).__name__, dictrepr)

    def update(self, *args, **kwargs):
        for k, v in dict(*args, **kwargs).items():
            self[k] = v
        self._dump()

Which you can use like this:

db = DictPersistJSON("db.json")
db["foo"] = "bar" # Will trigger a write

Which is woefully inefficient, but can get you off the ground quickly.

Adin answered 17/4, 2014 at 15:38 Comment(2)
I've read it is better to subclass collections.abc.MutableMapping instead of dict. The current implementation fails if someone calls db.setdefault('baz', 123). This article explains why: kr41.net/2016/03-23-dont_inherit_python_builtin_dict_type.htmlTransvestite
Found what I was looking for. Have a usecase where performance not relevant. ThanksSidran
B
6

Unpickle from file when program loads, modify as a normal dictionary in memory while program is running, pickle to file when program exits? Not sure exactly what more you're asking for here.

Blockus answered 4/8, 2009 at 18:13 Comment(0)
Z
1

Assuming the keys and values have working implementations of repr, one solution is that you save the string representation of the dictionary (repr(dict)) to file. YOu can load it using the eval function (eval(inputstring)). There are two main disadvantages of this technique:

1) Is will not work with types that have an unuseable implementation of repr (or may even seem to work, but fail). You'll need to pay at least some attention to what is going on.

2) Your file-load mechanism is basically straight-out executing Python code. Not great for security unless you fully control the input.

It has 1 advantage: Absurdly easy to do.

Zara answered 4/8, 2009 at 18:24 Comment(1)
ast.literal_eval() if your dict contains only basic Python types.Debark
B
1

My favorite method (which does not use standard python dictionary functions): Read/write YAML files using PyYaml. See this answer for details, summarized here:

Create a YAML file, "employment.yml":

new jersey:
  mercer county:
    pumbers: 3
    programmers: 81
  middlesex county:
    salesmen: 62
    programmers: 81
new york:
  queens county:
    plumbers: 9
    salesmen: 36

Step 3: Read it in Python

import yaml
file_handle = open("employment.yml")
my__dictionary = yaml.safe_load(file_handle)
file_handle.close()

and now my__dictionary has all the values. If you needed to do this on the fly, create a string containing YAML and parse it wth yaml.safe_load.

Bolivia answered 4/8, 2009 at 19:9 Comment(0)
R
1

If using only strings as keys (as allowed by the shelve module) is not enough, the FileDict might be a good way to solve this problem.

Richie answered 21/8, 2012 at 16:2 Comment(1)
The link to FileDict does not workWillettewilley
S
0

pickling has one disadvantage. it can be expensive if your dictionary has to be read and written frequently from disk and it's large. pickle dumps the stuff down (whole). unpickle gets the stuff up (as a whole).

if you have to handle small dicts, pickle is ok. If you are going to work with something more complex, go for berkelydb. It is basically made to store key:value pairs.

Synergist answered 4/8, 2009 at 19:3 Comment(0)
M
0

Have you considered using dbm?

import dbm
import pandas as pd
import numpy as np
db = b=dbm.open('mydbm.db','n')

#create some data
df1 = pd.DataFrame(np.random.randint(0, 100, size=(15, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(101,200, size=(10, 3)), columns=list('EFG'))

#serialize the data and put in the the db dictionary
db['df1']=df1.to_json()
db['df2']=df2.to_json()


# in some other process:
db=dbm.open('mydbm.db','r')
df1a = pd.read_json(db['df1'])
df2a = pd.read_json(db['df2'])

This tends to work even without a db.close()

Maryannmaryanna answered 30/12, 2020 at 19:26 Comment(4)
Please avoid using rhetorical questions in answers. This can make it seem like you are asking a question in an answer.Loritalorn
Its ok! we are dealing with human beings who understand the neuances of the english languageMaryannmaryanna
I was reviewing this answer in a review queue over a year ago and that message is a default option that SO offers. Hope that clears it up.Loritalorn
The dbm module is very much like a dict out of the box. No pandas or numpy needed. The only catch is that the keys and values need to be strings or bytes.Georgenegeorges
R
0

I made a module for this https://github.com/tintin10q/persistentdict. I hope this is helpfull.

from persistentdict import persistentdict

with persistentdict("test") as d:
    d["test"] = "test"
    d["test2"] = "test2"

with persistentdict("test") as d:
    print(d["test"]) # test

It is just a context manager that gives you a dict which it fills with the content of the file and then writes back to the file. The specific file with the code is here:

https://github.com/tintin10q/persistentdict/blob/production/persistentdict.py

Recess answered 20/3, 2023 at 23:42 Comment(1)
A link to a solution is welcome, but please ensure your answer is useful without it: add context around the link so your fellow users will have some idea what it is and why it is there, then quote the most relevant part of the page you are linking to in case the target page is unavailable. Answers that are little more than a link may be deleted.Hammerlock

© 2022 - 2024 — McMap. All rights reserved.