Porter Stemming of fried
Asked Answered
K

2

2

Why does the porter stemming algorithm online at

http://text-processing.com/demo/stem/

stem fried to fri and not fry?

I can't recall any words ending with ied past tense in English that have a nominative form ending with i.

Is this a bug?

Kennith answered 26/12, 2014 at 15:57 Comment(0)
A
5

A stem as returned by Porter Stemmer is not necessarily the base form of a verb, or a valid word at all. If you're looking for that, you need to look for a lemmatizer instead.

Audacious answered 26/12, 2014 at 16:35 Comment(2)
Great! Could someone give a link to some algorithm (paper or code) describing how the stemmer and lemmatizer are used in conjunction to, for instance, convert friedness to fry? Does the lemmatizer always operate on the output of the stemming algorithm or does it need both the original and stemmed version of the word?Envisage
Not always. For example, the lemma of most forms of "be"--"is", "was", "were", "are"--can't be determined from the stem. Stemming is simpler but error prone. In certain applications, though, that's acceptable. A lemmatizer may also use a stemmer as a fall-back.Gausman
R
2

Firstly, a stemmer is not a lemmatizer, see also Stemmers vs Lemmatizers:

>>> from nltk.stem import PorterStemmer, WordNetLemmatizer
>>> porter = PorterStemmer()
>>> wnl = WordNetLemmatizer()
>>> fried = 'fried'
>>> porter.stem(fried)
u'fri'
>>> wnl.lemmatize(fried)
'fried'

Next, a lemmatizer is Part-Of-Speech (POS) sensitive:

>>> wnl.lemmatize(fried, pos='v')
u'fry'
Renie answered 26/12, 2014 at 18:1 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.