I have been trying to process a good chunk of data (a few GBs) but my personal computer resists to do it in a reasonable time span, so I was wondering what options do I have? I was using python's csv.reader but it was painfully slow even to fetch 200,000 lines. Then I migrated this data to an sqlite database which retrieved results a bit faster and without using so much memory but slowness was still a major issue.
So, again... what options do I have to process this data? I was wondering about using amazon's spot instances which seem useful for this kind of purpose but maybe there are other solutions to explore.
Supposing that spot instances is a good option and considering I have never used them before, I'd like to ask what can I expect from them? Does anyone have experience using them for this kind of thing? If so, What is your workflow? I thought I could find a few blog posts detailing workflows for scientific computing, image processing or that kind of thing but I didn't find anything so if you can explain a bit of that or point out some links, I'd appreciate it.
Thanks in advance.