How to get logical parts of a sentence with java?
Asked Answered
E

2

6

Let's say there is a sentence:

On March 1, he was born.

Changing it to

He was born on March 1.

doesn't break the sense of the sentence and it is still valid. Shuffling words in any other way would produce weird to invalid sentences. So basically, I'm talking about parts of the sentence, which make the information more specific, but removing them doesn't break the whole sentence. Is there any NLP library in which identifying such parts is available?

Ethelstan answered 23/4, 2010 at 15:9 Comment(0)
S
28

Constituents

It sounds like you want to identify the sentence's constituents, which are groups of words that operate as a single unit according to the grammar of a language.

In fact, when linguistics are trying to discover a language's grammar, they do it in part by looking at movement. As in your example, this is where a group of words can be moved to a different position in a sentence while still preserving the meaning of the sentence.

Constituents can be individual words, phrases, or even larger groups such as whole clauses. Within a sentence, they have a nested hierarchical structure. For instance, the first example sentence you gave could be analyzed as:

(S  (PP (IN On) (NP (NNP March) (CD 1)))
    (NP (PRP he))
    (VP (VBD was) (VP (VBN born))))

The whole sentence is made up of a prepositional phrase, followed by a noun phrase, and then a verb phrase. The prepositional phrase can be further decomposed into a unit consisting of the single word 'On' followed by a noun phrase.

Phrase Structure Parsers

To find constituents automatically, you will probably want to use a phrase structure parser. There are many such parses to choose from that are available as open source, including:

The Stanford and Berkeley parsers are probably the easiest to install and use. As seen in Cer et al. 2010, the most accurate parsers are Berkeley and Charniak. The Bikel parser is slower and less accurate than the others.

Online Demo

There's an online demo for the Stanford parser here. I used the demo to produce the parse given above of your example sentence.

A Note About Deletion

Within each constituent, there will be a head word. For example, take the noun phrase:

(NP (DT The) (JJ big) (JJ blue) (NN ball))

The head word here is the noun ball, and it is modified by the adjectives big and blue. If this noun phrase was embedded in a sentence, you could delete those modifiers and still have something that was consistent with, but less specific than, the meaning of the original sentence.

Within noun phrases, you can generally delete the adjectives, nouns that are not the head, and nested prepositional phrases.

Within verb phrases and complete clauses, things get more tricky since deleting material that servers as an argument to the verb can completely change the interpretation a sentence. For example, deleting the book from He sold Jim the book results in He sold Jim.

Singapore answered 24/4, 2010 at 3:29 Comment(6)
I just was looking at all these parsers and found a paper written by Daniel at nlp.stanford.edu/pubs/lrecstanforddeps_final_final.pdfThorpe
Yeah, that's actually my paper :)Singapore
I was wondering about the Link Grammer Parser performance but I see in your paper you've covered it under the RelEx parser.Thorpe
Hi @dmcer, does the conclusion in the paper, particularly that Charniak's parser performs better than Stanford's parser, and that Charniak's is more recommended to be used for Stanford dependencies still remains?Airmail
Thanks @dmcer, just wanted to check if the recent updates in the software would have huge impacts/changes in the paper's conclusion.Airmail
Does anyone know if there exists a public API endpoint I can hit for any of the implementations of the parsers listed above?Georgie
E
1

OpenNLP may do some of this for you. Phrase chunking and parsing should help you with this. However, this is not a particularly simple problem, and algorithms will tend to get confused as sentence structure becomes more complex and ambiguous. You should sometimes be able to reorder phrases within a sentence and maintain meaning.

Englis answered 23/4, 2010 at 15:13 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.