I'm experiencing a bit of a problem which has to do with regular expressions and CategorizedPlaintextCorpusReader
in Python.
I want to create a custom categorized corpus and train a Naive-Bayes classifier on it. My issue is the following: I want to have two categories, "pos" and "neg". The positive files are all in one directory, main_dir/pos/*.txt
, and the negative ones are in a separate directory, main_dir/neg/*.txt
.
How can I use the CategorizedPlaintextCorpusReader
to load and label all the positive files in the pos directory, and do the same for the negative ones?
NB: The setup is absolutely the same as the Movie_reviews
corpus (~nltk_data\corpora\movie_reviews
).