When is it appropriate to use a database , in Python

Asked 9/6, 2012 at 2:16 Answered 9/6, 2012 at 2:49

I am making a little add-on for a game, and it needs to store information on a player:

username
ip-address
location in game
a list of alternate user names that have came from that ip or alternate ip addresses that come from that user name

I read an article a while ago that said that unless I am storing a large amount of information that can not be held in ram, that I should not use a database. So I tried using the shelve module in python, but I'm not sure if that is a good idea.

When do you guys think it is a good idea to use a database, and when it better to store information in another way , also what are some other ways to store information besides databases and flat file databases.

Sustainer answered 9/6, 2012 at 2:16 Comment(5)

think about this: do you need to store something that is gonna be "persistent"? – Procter 9/6, 2012 at 2:19

I would store it in a configuration file, read it into memory for the duration of the program, and then write it back into the config file if any changes were made. – Anitaanitra 9/6, 2012 at 2:21

@xvatar: And if you've answered that question with "yes", the next question would be "do I really need anything heavier than a text file?" – Lh 9/6, 2012 at 2:23

There are many options open to you for persistent data in the default distribution of Python: JSON, pickle, shelf, sqlite3, simple text file. Look at each and choose. – Chewy 9/6, 2012 at 2:27

This question isn't really answerable, as there are too many unknowns (and any answer is subject to opinion and discussion). The FAQ mentions discussion questions as not being a good fit for SO, as it's designed around "ask a question, get a definitive answer". There is no single answer that can be given that is "right" here. – Individuality 9/6, 2012 at 2:58

Most importantly, unless you specifically need performance or high reliability, do whatever will make your code simplest/easiest to write.

If your data is extremely structured (and you know SQL or are willing to learn) then using a database like sqlite3 might be appropriate. (You should ignore the comment about database size and RAM: there are times when databases are appropriate for even very small data sets, because of how the data is structured.)

If the data is relatively simple and you don't need the reliability that a database (normally) has then storing it in one of the builtin datatypes while the program is running is probably fine.

If you'd like the data stored on disk to be human readable (and editable, with a bit of effort), then a format like JSON (there is builtin json module) is nice, since the basic Python objects serialise without any effort. If the data not so simple then YAML is essentially an extended version of JSON (PyYAML is very good.). Similarly, you could use CSV files (the csv modules), although this is not nearly as good as JSON or YAML, or just a custom text format (but this is quite a lot of effort to get error handling and so on implemented neatly).

Finally, if your data contains more advanced objects (e.g. recursive dictionaries, or complicated custom datatypes) then using one of the builtin binary serialisation techniques (pickle, shelve etc.) might be appropriate, however, YAML can handle many of these things (including recursive data structures).

Some general points:

Plain text formats are nice, as they allow values to be tweaked easily and debugging/testing is easy
Binary formats are nice, as they mean that values can't be tweaked without a little bit of extra effort (this is not saying they can't be adjusted though), and the file size is smaller (probably not relevant)

Kaleena answered 9/6, 2012 at 2:49 Comment(0)

Assuming by 'database' you mean 'relational database' - even the embedded databases like SQLite come with some overhead compared to a plain text file. But, sometimes that overhead is worth it compared to rolling your own.

The biggest question you need to ask is whether you are storing relational data - whether things like normalisation and SQL queries make any sense at all. If you need to lookup data across multiple tables using joins, you should certainly use a relational database - that's what they're for. On the other hand, if all you need to do is lookup into one table based on its primary key, you probably want a CSV file. Pickle and shelve are useful if what you're persisting is the objects you use in your program - if you can just add the relevant magic methods to your existing classes and expect it all to make sense.

Certainly "you shouldn't use databases unless you have a lot of data" isn't the best advice - the amount of data goes more to what database you might use if you are using one. SQLite, for example, wouldn't be suitable for something the size of Stackoverflow - but, MySQL or Postgres would almost certainly be overkill for something with five users.

Tamar answered 9/6, 2012 at 2:39 Comment(0)

Recommended topics

Hot tags