I have a set of pickled text documents which I would like to stem using nltk's PorterStemmer
. For reasons specific to my project, I would like to do the stemming inside of a django app view.
However, when stemming the documents inside the django view, I receive an IndexError: string index out of range
exception from PorterStemmer().stem()
for the string 'oed'
. As a result, running the following:
# xkcd_project/search/views.py
from nltk.stem.porter import PorterStemmer
def get_results(request):
s = PorterStemmer()
s.stem('oed')
return render(request, 'list.html')
raises the mentioned error:
Traceback (most recent call last):
File "//anaconda/envs/xkcd/lib/python2.7/site-packages/django/core/handlers/exception.py", line 39, in inner
response = get_response(request)
File "//anaconda/envs/xkcd/lib/python2.7/site-packages/django/core/handlers/base.py", line 187, in _get_response
response = self.process_exception_by_middleware(e, request)
File "//anaconda/envs/xkcd/lib/python2.7/site-packages/django/core/handlers/base.py", line 185, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/Users/jkarimi91/Projects/xkcd_search/xkcd_project/search/views.py", line 15, in get_results
s.stem('oed')
File "//anaconda/envs/xkcd/lib/python2.7/site-packages/nltk/stem/porter.py", line 665, in stem
stem = self._step1b(stem)
File "//anaconda/envs/xkcd/lib/python2.7/site-packages/nltk/stem/porter.py", line 376, in _step1b
lambda stem: (self._measure(stem) == 1 and
File "//anaconda/envs/xkcd/lib/python2.7/site-packages/nltk/stem/porter.py", line 258, in _apply_rule_list
if suffix == '*d' and self._ends_double_consonant(word):
File "//anaconda/envs/xkcd/lib/python2.7/site-packages/nltk/stem/porter.py", line 214, in _ends_double_consonant
word[-1] == word[-2] and
IndexError: string index out of range
Now what is really odd is running the same stemmer on the same string outside django (be it a seperate python file or an interactive python console) produces no error. In other words:
# test.py
from nltk.stem.porter import PorterStemmer
s = PorterStemmer()
print s.stem('oed')
followed by:
python test.py
# successfully prints 'o'
what is causing this issue?
nltk.__version__
once you have imported it. Maybe you use two different versions for django and external python. Could you also check the python version that you use in django and to run the external script? I suppose it's always2.7
, given theprint
statement. – Gretchengretes = PorterStemmer()
should be put somewhere in your global variables are. Putting them in the view means loading thePorterStemmer
object for every page that loads this view function. – Fagalyget_result
, can you do ax = 'oed'
and thenprint x
and see what you get on your console where you usepython manage.py runserver
? I suspect it's django swallowing words. – Fagalyviews.py
add this:# coding: utf-8
in the first line andfrom __future__ import unicode_literals
. The django and nltk version should also be reported in the OP as well as the github issue. – Fagalystr
orchar
in #41503627 =( – Fagaly//anaconda/envs/xkcd/bin/
but I had been running test.py using ipython, not python as stated above. The ipython installation was defined my root environment//anaconda/bin/ipython
which must have given it access to the nltk version specified in my root environment (version 3.2.0). I downgraded my virtual environment's nltk to version 3.2.0 and ran the code successfully on the django app. Does this mean it is an issue with nltk 3.2.2? – Lakeshialakey