I have a large set of strings that I'm using for natural language processing research, and I'd like a nice way to store it in Python.
I could use pickle, but loading the entire list into memory would then be an impossibility (I believe), as it's about 10 GB large, and I don't have that much main memory. Currently I have the list stored with the shelve library... The shelf is indexed by strings, "0", "1", ..., "n" which is a bit clunky.
Are there nicer ways to store such an object in a single file, and still have random (ish) access to it?
It may be that the best option is to split it into multiple lists.
Thanks!