Most importantly, unless you specifically need performance or high reliability, do whatever will make your code simplest/easiest to write.
If your data is extremely structured (and you know SQL or are willing to learn) then using a database like sqlite3
might be appropriate. (You should ignore the comment about database size and RAM: there are times when databases are appropriate for even very small data sets, because of how the data is structured.)
If the data is relatively simple and you don't need the reliability that a database (normally) has then storing it in one of the builtin datatypes while the program is running is probably fine.
If you'd like the data stored on disk to be human readable (and editable, with a bit of effort), then a format like JSON (there is builtin json
module) is nice, since the basic Python objects serialise without any effort. If the data not so simple then YAML is essentially an extended version of JSON (PyYAML is very good.). Similarly, you could use CSV files (the csv
modules), although this is not nearly as good as JSON or YAML, or just a custom text format (but this is quite a lot of effort to get error handling and so on implemented neatly).
Finally, if your data contains more advanced objects (e.g. recursive dictionaries, or complicated custom datatypes) then using one of the builtin binary serialisation techniques (pickle
, shelve
etc.) might be appropriate, however, YAML can handle many of these things (including recursive data structures).
Some general points:
- Plain text formats are nice, as they allow values to be tweaked easily and debugging/testing is easy
- Binary formats are nice, as they mean that values can't be tweaked without a little bit of extra effort (this is not saying they can't be adjusted though), and the file size is smaller (probably not relevant)