Algorithm to detect and remove least number of inconsistent facts (probably in PROLOG)?

Asked 13/9, 2014 at 22:57 Answered 15/9, 2014 at 1:18

algorithm prolog logic algebraic-data-types symbolic-computation

How do you program the following algorithm?

Imagine a list of "facts" like this, where the letters represent variables bound to numeric values:

x = 1
y = 2
z = 3
a = 1
b = 2
c = 3
a = x
b = z
c = z

These "facts" clearly must not all be "true": it's not possible for b=z while b=2 and z=3. If either b=z or b=2 were removed, all facts would be consistent. If z=3 and either b=z or c=z were removed, then all facts would be consistent but there would be one fewer fact than above. This set contains many such consistent subsets. For instance, a=1, b=2, c=3 is a consistent subset, and so are many others.

Two consistent subsets are larger than any other consistent subset in this example:

x = 1
y = 2
z = 3
a = 1
b = 2
c = 3
a = x
c = z

and

x = 1
y = 2
z = 3
a = 1
c = 3
a = x
b = z
c = z

Using an appropriate programming language (I'm thinking PROLOG, but maybe I'm wrong) how would you process a large set containing consistent and inconsistent facts, and then output the largest possible sub-set of consistent facts (or multiple subsets as in the example above)?

Blub answered 13/9, 2014 at 22:57 Comment(1)

Do you have a clear strategy of preferring one fact over another? Do you have (in your particular application) the concept of "evidence", and are there facts that have different source of evidence? – Unreflective 16/9, 2014 at 5:30

This is very closely related to the NP-hard multiway cut problem. In the (unweighted) multiway cut problem, we have an undirected graph and a set of terminal vertices. The goal is to remove as few edges as possible so that each terminal vertex lies in its own connected component.

For this problem, we can interpret each variable and each constant as a vertex, and each equality as an edge from its left-hand side to its right-hand side. The terminal vertices are those associated with constants.

For only two terminals, the multiway cut problem is the polynomial-time solvable s-t minimum cut problem. We can use minimum cuts to get a polynomial-time 2-approximation to the multiway cut problem, by finding the cheapest cut separating two terminals, deleting the edges involved, and then recursing on the remaining connected components. Several approximation algorithms with better ratios have been proposed in the theoretical literature on multiway cut.

Practically speaking, multiway cut arises in applications to computer vision, so I would expect that there has been some work on obtaining exact solutions. I don't know what's out there, though.

Kanara answered 13/9, 2014 at 23:34 Comment(1)

Thank you for your comments. My application is data cleansing (finding most likely correct values in a "dirty" dataset). I will wait to give others a chance before selecting this as best answer. – Blub 14/9, 2014 at 3:10

Prolog could serve as a convenient implementation language, but some thought about algorithms suggests that a specialized approach may be advantageous.

Among statements of these kinds (equalities between two variables or between one variable and one constant) the only inconsistency that can arise is a path connecting two distinct constants.

Thus, if we find all "inconsistency" paths that connect pairs of distinct constants, it is necessary and sufficient to find a set of edges (original equalities) that disconnect all of those paths.

It is tempting to think a greedy algorithm is optimal here: always select an edge to remove which is common to the largest number of remaining "inconsistency" paths. Thus I propose:

1) Find all simple paths P connecting two different constants (without passing through any third constant), and construct a linkage structure between these paths and their edges.

2) Count the frequencies of edges E appearing along those "inconsistency" paths P.

3) Find a sufficient number of edges to remove by pursuing a greedy strategy, removing the next edge that appears most frequently and updating the counts of edges in remaining paths accordingly.

4) Given that upper bound on edges necessary to remove (to leave a consistent subset of statements), apply a backtracking strategy to determine whether any smaller number of edges would suffice.

As applied to the example in the Question, there prove to be exactly two "inconsistency" paths:

    2 -- b -- z -- 3
 2 -- b -- z -- c -- 3

Removing either of the two edges, 2 -- b or b -- z, common to both of these paths suffices to disconnect both "inconsistency" paths (removing all inconsistencies among remaining statements).

Moreover it is evident that no other single edge's removal would suffice to accomplish that.

Colored answered 15/9, 2014 at 1:18 Comment(0)

Recommended topics

Hot tags