The latest best practices, as another Stack Overflow answer indicates, seems to be to add an attribute to the token stream and later access that attribute, rather than getting an attribute directly from the token stream. And for good measure you can make sure the analyzer gets closed. Using the very latest Lucene (currently v8.6.2) the code would look like this:
String text = "foo bar";
String fieldName = "myField";
List<String> tokens = new ArrayList();
try (Analyzer analyzer = new StandardAnalyzer()) {
try (final TokenStream tokenStream = analyzer.tokenStream(fieldName, text)) {
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);
tokenStream.reset();
while(tokenStream.incrementToken()) {
tokens.add(charTermAttribute.toString());
}
tokenStream.end();
}
}
After that code is finished, tokens
will contain a list of parsed tokens.
See also: Lucene Analysis Overview.
Caveat: I'm just starting to write Lucene code, so I don't have a lot of Lucene experience. I have taken the time to research the latest documentation and related posts, however, and I believe that the code I've placed here follows the latest recommended practices slightly better than the current answers.