As of October, 2017, the nltk includes a collection of Arabic stopwords. If you ran nltk.download()
after that date, this issue will not arise. If you have been a user of nltk for some time and you now lack the Arabic stopwords, use nltk.download()
to update your stopwords corpus.
If you call nltk.download()
without arguments, you'll find that the stopwords
corpus is shown as "out of date" (in red). Download the current version that includes Arabic.
Alternately, you can simply update the stopwords corpus by running the following code once, from the interactive prompt:
>>> import nltk
>>> nltk.download("stopwords")
Note:
Looking words up in a list is really slow. Use a set, not a list. E.g.,
arb_stopwords = set(nltk.corpus.stopwords.words("arabic"))
Original answer (still applicable to languages that are not included)
Why don't you just check what the stopwords
collection contains:
>>> from nltk.corpus import stopwords
>>> stopwords.fileids()
['danish', 'dutch', 'english', 'finnish', 'french', 'german', 'hungarian',
'italian', 'norwegian', 'portuguese', 'russian', 'spanish', 'swedish',
'turkish']
So no, there's no list for Arabic. I'm not sure what you mean by "add it", but the stopwords lists are just lists of words. They don't even do morphological analysis, or other things you might want in an inflecting language. So if you have (or can put together) a list of Arabic stopwords, just put them in a set()
¹ and you're one step ahead of where you'd be if your code worked.