The previous example applies stemming to a search query, so if you are interesting to stem a full text you can try the following:
import java.io.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.tokenattributes.*;
import org.apache.lucene.analysis.snowball.*;
import org.apache.lucene.util.*;
...
public class Stemmer{
public static String Stem(String text, String language){
StringBuffer result = new StringBuffer();
if (text!=null && text.trim().length()>0){
StringReader tReader = new StringReader(text);
Analyzer analyzer = new SnowballAnalyzer(Version.LUCENE_35,language);
TokenStream tStream = analyzer.tokenStream("contents", tReader);
TermAttribute term = tStream.addAttribute(TermAttribute.class);
try {
while (tStream.incrementToken()){
result.append(term.term());
result.append(" ");
}
} catch (IOException ioe){
System.out.println("Error: "+ioe.getMessage());
}
}
// If, for some reason, the stemming did not happen, return the original text
if (result.length()==0)
result.append(text);
return result.toString().trim();
}
public static void main (String[] args){
Stemmer.Stem("Michele Bachmann amenities pressed her allegations that the former head of her Iowa presidential bid was bribed by the campaign of rival Ron Paul to endorse him, even as one of her own aides denied the charge.", "English");
}
}
The TermAttribute class has been deprecated and will not longer be supported in Lucene 4, but the documentation is not clear on what to use at its place.
Also in the first example the PorterStemmer is not available as a class (hidden) so you cannot use it directly.
Hope this helps.
new SnowballAnalyzer("English");
. – Cruz