I'm trying to break up a paragraph into sentences. Here is my code so far:
import java.util.*;
public class StringSplit {
public static void main(String args[]) throws Exception{
String testString = "The outcome of the negotiations is vital, because the current tax levels signed into law by President George W. Bush expire on Dec. 31. Unless Congress acts, tax rates on virtually all Americans who pay income taxes will rise on Jan. 1. That could affect economic growth and even holiday sales.";
String[] sentences = testString.split("[\\.\\!\\?]");
for (int i=0;i<sentences.length;i++){
System.out.println(i);
System.out.println(sentences[i]);
}
}
}
Two problems were found:
- The code splits anytime it comes to a period (".") symbol, even when it's actually one sentence. How do I prevent this?
- Each sentence that is split starts with a space. How do I delete the redundant space?
BreakIterator
is a good idea, but it suffers from many of these same types of problems. See this question: #17160013 – Twostep