I've successfully followed this example for my own text classification script.
The problem is I'm not looking to process pieces of a huge, but existing data set in a loop of partial_fit calls, like they do in the example. I want to be able to add data as it becomes available, even if I shut down my python script in the meantime.
Ideally I'd like to do something like this:
sometime in 2015:
model2015=partial_fit(dataset2015)
save_to_file(model2015)
shut down my python script
sometime in 2016:
open my python script again
load_from_file(model2015)
partial_fit(dataset2016 incorporating model2015)
save_to_file(model2016)
sometime in 2017:
open my python script again
etc...
Is there any way I can do this in scikit-learn? Or in some other package (Tensorflow perhaps)?