Algorithms or data structures for dealing with ambiguity
Asked Answered
S

3

8

I'm looking for algorithms or data structures specifically for dealing with ambiguities.

In my particular current field of interest I'm looking into ambiguous parses of natural languages, but I assume there must be many fields in computing where ambiguity plays a part.

I can find a lot out there on trying to avoid ambiguity but very little on how to embrace ambiguity and analyse ambiguous data.

Say a parser generates these alternative token streams or interpretations:

  • A B1 C
  • A B2 C
  • A B3 B4 C

It can be seen that some parts of the stream are shared between interpretations (A ... B) and other parts branch into alternative interpretations and often meet back with the main stream.

Of course there may be many more interpretations, nesting of alternatives, and interpretations which have no main stream.

This is obviously some kind of graph with nodes. I don't know if it has an established name.

Are there extant algorithms or data structures I can study that are intended to deal with just this kind of ambiguous graph?

Sannyasi answered 18/9, 2014 at 1:54 Comment(7)
I would describe this as shared structure (e.g., SRFI 38) and memoization.Ferriage
I've found that for ambiguous parsing there is a structure called an "SPPF" - shared packed parse forest. I can't find a good dedicated source on it and I'm not sure how useful it would be for dealing with ambiguity in fields other than parsing.Sannyasi
SPPF's are described by Elizabeth Scott in dinhe.net/~aredridel/.notmine/PDFs/… They are the basically the same as Marpa's bocages -- I came up with them independently, but after Scott did. In any case, she's written them up nicely.Micronutrient
@JeffreyKegler: That's interesting. Reading around what I could find I was getting the impression SPPF's were due to Masaru Tomita, who's responsible for GLR parsing. I was coming across phrases like "Tomita-style SPPF" or such.Sannyasi
In fact I hadn't realized that I had earlier asked pretty much the same question on linguistics.SE : Is there a favoured data structure for storing ambiguous parse trees in Natural Language Processing?Sannyasi
@DavidEisenstat: That sounds like some terms I've run into many times as I research this: shared forest, shared parse forest, shared packed parse forest (SPPF).Sannyasi
There is the concept of shared forest and there is the expression shared forest. I do not know when the expression was used the first time as such, or who used it. Regarding the concept, it is at least as old as the CYK algorithm (circa 1965), if not older, since the CYK algorithm with back-pointers does build a cubic shared forest. However, it was not identified as simply a grammar until the early 1990's. On the other hand the principles of GLR parsing date back to 1974.Tetragram
T
7

Ambiguity and sharing in Natural Language Parsing

Ambiguity and sharing in general

Given the generality of your question, I am trying to match that generality.

The concept of ambiguity arises as soon a you consider a mapping or function f: A -> B which is not injective.

An injective function (also called one-to-one function) is one such that when a≠a' then f(a) ≠ f(a'). Given a function f, you are often interested in reversing it: given an element b of the codomain B of f, you want to know what element a of the domain A is such that f(a)=b. Note that there may be none if the function is not surjective (i.e. onto).

When the function is not injective, there may be several values a in A such that f(a)=b. In other words, if you use values in B to actually represent values in A through the mapping f, you have an ambiguous representation b that may not determine the value a uniquely.

From this you realize that the concept of ambiguity is so general that it is unlikely that there is a unified body of knowledge about it, even when limiting this to computer science and programming.

However, if you wish to consider reversing a function creating such ambiguities, for example to compute the set f'(b)={a∈A | f(a)=b}, or the best element(s) in that set according to some optimality criterion, there are inded some techniques that may help you in situations where the problem can be decomposed into subproblems that often re-occur with the same arguments. Then, if you memorize the result(s) for the various combinations of arguments encountered, you never compute twice the same thing (the subproblem is said to be memo-ized). Note that ambiguity may exist for subproblems too, so that there may be several answers for some subproblem instances, or optimal answers among several others.

This amount to sharing a single copy of a subproblem between all the situations that require solving it with this set of parameters. The whole technique is called dynamic programming, and the difficulty is often to find the right decomposition into subproblems. Dynamic programming is primarily a way to share the repeated subcomputation for a solution, so as to reduce complexity. However, if each subcomputation produces a fragment of a structure that is reused recursively in larger structures to find an answer which is a structured object (a graph for exmple), then the sharing of subcomputation step may result in also sharing a corresponding substructure in all the places where it is needed. When many answers are to be found (because of ambiguity for example), these answers can share subparts.

Rather than finding all the answers, dynamic programming can be used to find those satisfying some optimality criterion. This requires that an optimal solution of a problem uses optimal solutions of subproblems.

The case of linguistic processing

Things can be more specific in the case of linguistics and language processing. For that purpose, you have to identify the domains you are dealing with, and the kind of functions you use with these domains.

The purpose of language is to exchange information, concepts, ideas that reside in our brains, with the very approximate assumption that our brains use the same functions to represent these ideas linguistically. I must also simplify things considerably (sorry about it) because this is not exactly the place for a full theory of language, which would be disputed anyway. And I cannot even consider all types of syntactic theories.

So linguistic exchange of information, of an idea, from a person P to a person Q goes as follow:

idea in P ---f--> syntactic tree ---g--> lexical sequence ---h--> sound sequence
                                                                        |
                                                                        s
                                                                        |
                                                                        V
idea in Q <--f'-- syntactic tree <--g'-- lexical sequence <--h'-- sound sequence

The first line is about sentence generation taking place in person P, and the second line is about sentence analysis taking place in person Q. The function s stands for speech transmission, and should be the identity function. The functions f', g' and h' are supposed to be the inverse of the functions f,g, and h that compute the successive representations down to the spoken representation of the idea. But each of these functions may be non injective (usually is) so that ambiguities are introduced at each level, making it difficult for Q to inverse then to retrieve the original meaning from the sound sequence it receives (I am deliberately using the word sound to avoid getting into details). The same diagram holds, with some variations in details, for written communication.

We ignore f and f' since they are concerned with semantics, which may be less formalized, and for which I do not have competence. Syntax trees are often defined by grammatical formalisms (here I am skipping over important refinements such as feature structures, but they can be taken into account).

Both the function g and the function h are usually not injective, and thus are sources of ambiguity. There are actually other sources of ambiguity due to all kind of errors inherent to the speech chain, but we will ignore it for simplicity, as it does not much change the nature of problems. The presence of errors, due to sentence generation or transmission, or to language specification mismatch between the speaker and the listener, is an extra source of ambiguity since the listener attempts to correct potential errors without knowing what they may have been or whether they exist at all.

We assume that the listener does not make mistakes, and that he attempts to best "decode" the sentence according to his own linguistic standards and knowledge, including knowledge of error sources and statistics.

Lexical ambiguity

Given a sound sequence, the listening system has to inverse the effect of the lexical generation function g with a function g'. A first probem is that several different words may give the same sound sequence, which is a first source of ambiguity. The second problem is that the listening system actually receives the sequence corresponding to a string of words, and there may be no indication of where words begin or end. So they may be different ways of cutting the sound sequence into subsequences corresponding to recognizable words. This problem may be worsened when noise creates more confusion between words.

An example is the following holorime verses taken from the web, that are pronounced more or less similarly:

Ms Stephen, without a first-rate stakeholder sum or deal,
Must, even with outer fur straight, stay colder - some ordeal.

The analysis of the sound sequence can be performed by a finite state non-deterministic automaton, interpreted in a dynamic programming mode, which produces a directed acyclic graph where the nodes correspong the word separations and the edges to recongnized words. Any longest path through the graph corresponds to a possible way of analyzing the sound sequence as a sequence of words.

The above example gives the (fairly simple) word lattice (oriented left to right):

         the-*-fun
        /         \
   Ms -*-- Stephen \  without --*-- a    first -*- ...
 /                 \ /               \ /
*                   *                 *
 \                 / \               / \
   must --*-- even     with -*- outer   fur -*-  ...

So that the sound sequence could also correspond to the following word sequences (among several others):

Ms Stephen, with outer first-rate ... 
Must, even with outer first-rate ...

This make the lexical analysis ambiguous.

Probabilities may be used to choose a best sequence. But it is also possible to keep the ambiguity of all possible reading and use it as is in the next stage of sentence analysis.

Note that the word lattice may be seen as a finite state automaton that generates or recognizes all the possible lexical readings of the word sequence

Syntactic ambiguity

Syntactic structure is often based on a context-free grammar skeleton. The problem of ambiguity of context-free languages is well known and analyzed. A number of general CF parsers have been devised to parse ambiguous sentences, and produce a structure (which varies somewhat) from which all parses can be extracted. Such structure have come to be known as parse forests, or shared parse forest.

What is known is that the structure can be at worst cubic in the length of the analyzed sentence, on the condition that the language grammar is binarized, i.e. with no more than 2 non-terminals in each rule right-hand-side (or more simply, no more than 2 symbols in each rule right-hand-side).

Actually, all these general CF parsing algorithms are more or less sophisticated variations around a simple concept: the intersection of the language L(A) of a finite state automaton A and the language L(G) of a CF grammar G. Construction of such an intersection goes back to the early papers on context-free languages (Bar-Hillel, Perles and Shamir 1961), and was intended as proof of a closure property. It took some thirty years to realize that it was also a very general parsing algorithm in a 1995 paper.

This classical cross-product construction yields a a CF grammar for the intersection of the two languages L(A) and L(G). If you consider a sentence w to be parsed, represented as a sequence of lexical elements, it can also be viewed as finite state automaton W that generate only the sentence w. For example:

        this     is    a     finite     state     automaton
 => (1)------(2)----(3---(4)--------(5)-------(6)-----------((7))

is a finite state automaton W accepting only the sentence w="this is a finite state automaton". So L(W)={w}.

If the grammar of the language is G, then the intersection construction gives a grammar G_w for the language L(G_w)=L(W)∩L(G).

It the sentence w is not in L(G), then L(G_w) is empty, and the sentence is not recognized. Else L(G_w)={w}. Furthermore, it is then easily proved that the grammar G_w generates the sentence w with exactly the same parse-trees (hence the same ambiguity) as the grammar G, up to a simple renaming of the non-terminals.

The grammar G_w is the (shared) parse forest for w, and the set of parse trees of w is precisely the set of derivations with this grammar. So this gives a very simple view organizing the concepts, and explaining the structure of shared parse forests and general CF parsers.

But there is more to it, because it shows how to generalize to different grammars and to different structures to be parsed.

Constructive closure of intersection with regular sets by cross-product constructions is common to a lot of grammatical formalisms that extend the power of CF grammars somewhat into the context-sensitive realm. This includes tree-adjoining grammars, and linear context-free rewriting systems. Hence this is a guideline on how to build for these more powerful formalisms general parsers that can handle ambiguity and produce shared parse-forests, wich are simply specialized grammars of the same type.

The other generalization is that, when there is lexical ambiguity so that lexical analysis produces many candidate sentences represented with sharing by a word lattice, this word lattice can be read as a finite state automaton recognizing all these sentences. Then, the same intersection construction will eliminate all sentences that are not in the language (not grammatical), and produce a CF grammar that is a shared parse forest for all possible parses of all admissible (grammatical) sentences from the word lattice.

As requested in the question, all possible ambiguous readings are preserved as long as compatible with available linguistic or utterance information.

The handling of noise and ill-formed sentences is usually modelled also with finite state devices, and can thus be addressed by the same techniques.

There are actually many other issues to be considered. For example, there are many ways of building the shared forest, with more or less sharing. The techniques used to precompile pushdown automata to be used for general context-free parsing my have an effect on the quality of the sharing. Being too smart is not always very smart.

See also other answers I made on SE on this topic:

Tetragram answered 1/10, 2014 at 0:10 Comment(0)
S
2

I'm experimenting with PFGs -- Parse-Forest Grammars built using Marpa::R2 ASF.

The approach is to represent ambiguous parses as a grammar, design a criterion to prune unneeded rules, apply it and then remove unproductive and unaccessible symbols from the PFG thus arriving at a parse tree.

This test case is an illustration: it parses arithmetic expressions with highly ambiguous grammar, then prunes the PFG rules based on associativity and precedence, cleans up the grammar, converts it to abstract syntax tree (Problem 3.10 from the cited source — Grune and Jacobs).

Sacksen answered 26/9, 2014 at 16:14 Comment(2)
I have been looking through the Marpa documentation, but I cannot find a place where it talks explicitly of Parse-Forest Grammars, or of Parse-Forest as grammars. Do you have a pointer to it. I know Parse-Forest Grammars in general, but it is the use in Marpa I am interested in.Tetragram
Parse-forest grammars (PFGs) are not used in Marpa. Marpa::R2 uses abstract syntax forests (ASFs) to represent ambiguous parses. PFGs look useful to work with multiple parses at once, so I use ASFs to build PFGs and then work with PFGs.Sacksen
D
1

I'd call this data structure a lattice, see for instance Lexicalized Parsing (PDF).

Dozen answered 19/9, 2014 at 16:42 Comment(4)
Yes I've been coming across the term "word lattice" quite a bit, mostly in connection with stages before parsing such as word segmentation and morphological analysis.Sannyasi
The name lattice, or more precisely word lattice denotes the sharing structure used to represent lexical ambiguity (simply cutting the sentence into lexical units), but not syntactic ambiguity in general. The proper representation of syntactic ambiguity (at the tree structure level for sentence organization) is a grammar that generates only the given sentence in different ways (or all the sentences defined by a lattice in case of lexical ambiguity). CC @SannyasiTetragram
@babou: Another term/concept related to lattices that I've started coming across the more I dig into this is confusion network or even word confusion network (WCN). I don't yet grok the difference between these and lattices though.Sannyasi
I am working on an answer to your original question. Regarding WCN, it seems to be a variation on word lattices which seems fairly new, The intent is to use a different optimization criterion when selecting a a best answer in an ambiguous situation due to noise. The reference paper is apparently Mangu, Brill, Stolke 2000: Finding consensus in speech recognition: word error minimization and other applications of confusion networks (it is actually a pdf file with an inappropriate name).Tetragram

© 2022 - 2024 — McMap. All rights reserved.