Python stemmer issue: wrong stem
Asked Answered
S

1

0

Hi i'm trying to stem words with a python stemmer, i tried Porter and Lancaster, but they have the same problem. They can't stem correclty words that end with "er" or "e".

for example, they stem

computer -->  comput

rotate   -->  rotat

this is a part of the code

line=line.lower()
line=re.sub(r'[^a-z0-9 ]',' ',line)
line=line.split()
line=[x for x in line if x not in stops]
line=[ porter.stem(word, 0, len(word)-1) for word in line]
# or 'line=[ st.stem(word) for word in line]'
return line

any idea to fix this problem?

Spurious answered 7/8, 2014 at 22:19 Comment(5)
What stemmer? Can you please include a minimal, complete, and verifiable example (with the source code).Physicist
Hi, i've updated the question, this issue occurs with both stemmers in the line where the stemmer is calledSpurious
what are you trying to achieve?Achromatin
i'm trying to obtain the stem of each word, for example cats -> cat or playing -> playSpurious
why is computer -> comput not correct ? I might be wrong but comput looks like a stem for computing, computed, computer, computation. Like rotat seems common to rotate, rotation etc.Dispose
D
4

To quote the page on Wikipedia, In computational linguistics, a stem is the part of the word that never changes even when morphologically inflected, whilst a lemma is the base form of the word. For example, given the word "produced", its lemma (linguistics) is "produce", however the stem is "produc": this is because there are words such as production. So your code is likely giving you correct results. You seem to expect a lemma which is not what a stemmer produces (except when the lemma happens to equal the stem)

Dispose answered 8/8, 2014 at 0:14 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.