Can anyone point me to some large corpus that I use for classification?
But by large I don't mean Reuters or 20 newsgroups, I'm talking about a corpus of GB size, not 20MB or something like that.
I was able only to find this Reuters and 20 newsgroups, which is very small for the thing I need.