What is the difference between naive and semi naive evaluation?
Asked Answered
B

2

12

I have been trying to implement an algorithm for the semi naive evaluation of a datalog program but couldn't get a straightforward answer anywhere that explains the difference in simple words.

According to my understanding naive is a bottom up evaluation technique so is semi-naive.

In the first iteration both evaluation techniques start with an empty set.

As the iterations proceed further both end up having iterations and producing tuples until a new tuple is reached.

So the semi-naive starts from head or body of the rule?

path (X,Y) :- edge(X,Y).

path (X,Y) :- edge(X,Z), path(Z,Y).

Can someone please explain how the EDB and IDB gets updated at the end of each iteration for the above program. Is the tuples stored under each predicate. Like a separate column for edge and a separate column for path or they get stored as a collection.

Also what is the difference between global and local unification?

Both answered 31/10, 2017 at 20:22 Comment(2)
"This algorithm has been termed the naive algorithm for datalog evaluation. The central idea of the seminaive algorithm is to focus, to the extent possible, on the new facts generated at each level and thereby avoid recomputing the same facts." — Foundations of Databases (TOC)Outline
Thank you very much for providing me the link to the algorithm. It is really simple and the book also is very informative for a beginner! Cheers @DanielLyonsBoth
R
14

The difference between the naîve and semi-naîve evaluation in Datalog is that when you're evaluating using the naïve implementation you take all the initial dataset (existing EDBs) plus the news ones (inferred EDBs) for each iteration. For example, if you have the IDBs like this:

reachable(X,Y) :- link(X,Y).
reachable(X,Y) :- link(X,Z), reachable(Z,Y).

And a set of EDBs like this: link = {(a,b), (b,c), (c,c), (c,d)} The procedure to execute the evaluation is:

  1. Begin by assuming all IDB relations are empty.
  2. Repeatedly evaluate the rules using the EDB and the previous IDB to get a new IDB.
  3. End when there is no change to IDB.

When you're using a naîve approach in every step you'll have the follow data as input and output:

 | Iteration |Input for the current iteration I_{i}            | New facts inferred           |
 |-----------|-------------------------------------------------|------------------------------|
 |  1        | {}                                              | {(a,b), (b,c), (c,c), (c,d)} |
 |  2        | {(a,b), (b,c), (c,c), (c,d)}                    | {(a,c),(b,c),(b,d),(c,d)}    |
 |  3        | {(a,b), (b,c), (c,c), (c,d), (a,c), (b,d)}      | {(a,d)}                      |
 |  4        | {(a,b), (b,c), (c,c), (c,d), (a,c), (b,d),(a,d)}| {}                           |

At the 4th iteration, you'll stop because the fixpoint is reached, and no new facts could be inferred. However, in the semi-naïve approach, you apply an optimization, instead of using all derived facts as input to rules at each iteration, it's possible to send to each iteration only the tuples already learned in prior iterations, for avoid duplicates tuples.

 | Iteration |Input for the current iteration I_{i}  | New facts inferred           |
 |-----------|---------------------------------------|------------------------------|
 |  1        | {}                                    | {(a,b), (b,c), (c,c), (c,d)} |
 |  2        | {(a,b), (b,c), (c,c), (c,d)}          | {(a,c),(b,c),(b,d),(c,d)}    |
 |  3        | {(a,c), (b,d)}                        | {(a,d)}                      |
 |  4        | {(a,d)}                               | {}                           |

Source: Datalog and Recursive Query Processing

Registered answered 4/4, 2018 at 15:49 Comment(0)
O
3

I am going to tell you about Prolog. I don't know if it applies equally to Datalog or not, so if I'm wrong, someone will just have to correct me.

So the semi-naive starts from head or body of the rule?

According to this handout, you can proceed with a variable-based or a tuple-based algorithm, but in both cases you start with the body and only after it succeeds do you add a tuple representing the head:

Variable-based: Consider all possible assignments to the variable of the body. If the assignment makes the body true, add the tuple for the head to the result.

Tuple-based: Consider all assignments of tuples from the nonnegated relational subgoals. If the assignment makes the body true, add the tuple for the head to the result.

This jibes well with what I know about Prolog and backward-chaining: you want to conclude the head, so you must first prove the body.

You seem also to be asking if semi-naive has something to say about whether you start from the head or the body. Based on what I have reviewed today, it looks to me like semi-naive is a twist on the naive algorithm rather than a completely new thing. It's like a tabled version of the naive approach; if there are multiple naive approaches, you'll have just as many semi-naive approaches. If there's just one naive, there will be only one semi-naive.

Can someone please explain how the EDB and IDB gets updated at the end of each iteration for the above program.

That's easy: they do not. The EDB is the set of facts in the database. The IDB is the set of rules in the database. A query is just a query, it doesn't modify the database. The tuples that the query returns are another matter.

Are the tuples stored under each predicate?

The tuples that represent facts in the EDB are already stored in the EDB as facts. The tuples derived from the rules in the IDB are computed and become part of the result set and are not stored. In neither case is the store updated as a result of performing a query.

If we were talking about Prolog here, there would be this stack of recursive invocations going on. The outermost call might say path(a, z); inside that call might be something like edge(a, b), path(b, z), which will beget a call edge(b, c), path(c, z). In Prolog terms, each time you enter another invocation you wind up with a fresh set of variables, some bound, some yet-to-be-bound. In your Datalog world, it seems to me that edge(a,b) and edge(b,c) already exist as tuples in your EDB. During the query they will be part of the tuples in the stack of tuples that represents your result. The IDB contained a rule called path/2, and once you satisfy the recursive call, you will wind up with some new tuples like path(a,z) in your result. But that result tuple is not a fact stored in your EDB (which only contains facts like edge/2) nor is it going to replace the rule path/2. It's just part of the result of your query.

Also what is the difference between global and local unification?

I wasn't able to find anything using those terms and I can't imagine what they would mean.

So, there's my guess. Let's see how wide of the mark I am.

Outline answered 1/11, 2017 at 4:26 Comment(2)
Actually according to the properties of Deductive database system, IDB does get updated at the end of each iteration until a new tuple has been found. The database in the sequence where recursion does not add any new tuples is called the fixed point (Naturally, this depends on the initial EDB and on the IDB.) I agree with your other answers. Thanks for your explanation in simple terms :)Both
@confucious I don't see anything in the linked document that shows the IDB itself being modified as part of query resolution. Indeed, I just see I and I_new being updated with tuples. This corroborates my understanding that the IDB is the set of rules and is a not mutable during a query. However, IDB rules do generate tuples for the result set, and there is obviously an implied relationship between the rules in the IDB and the tuples in the result set. Is that what you mean?Outline

© 2022 - 2024 — McMap. All rights reserved.