The difference between the naîve and semi-naîve evaluation in Datalog is that when you're evaluating using the naïve implementation you take all the initial dataset (existing EDBs) plus the news ones (inferred EDBs) for each iteration.
For example,
if you have the IDBs like this:
reachable(X,Y) :- link(X,Y).
reachable(X,Y) :- link(X,Z), reachable(Z,Y).
And a set of EDBs like this: link = {(a,b), (b,c), (c,c), (c,d)}
The procedure to execute the evaluation is:
- Begin by assuming all IDB relations are empty.
- Repeatedly evaluate the rules using the EDB and the previous IDB to get a new IDB.
- End when there is no change to IDB.
When you're using a naîve approach in every step you'll have the follow data as input and output:
| Iteration |Input for the current iteration I_{i} | New facts inferred |
|-----------|-------------------------------------------------|------------------------------|
| 1 | {} | {(a,b), (b,c), (c,c), (c,d)} |
| 2 | {(a,b), (b,c), (c,c), (c,d)} | {(a,c),(b,c),(b,d),(c,d)} |
| 3 | {(a,b), (b,c), (c,c), (c,d), (a,c), (b,d)} | {(a,d)} |
| 4 | {(a,b), (b,c), (c,c), (c,d), (a,c), (b,d),(a,d)}| {} |
At the 4th iteration, you'll stop because the fixpoint is reached, and no new facts could be inferred.
However, in the semi-naïve approach, you apply an optimization, instead of using all derived facts as input to rules at each iteration, it's possible to send to each iteration only the tuples already learned in prior iterations, for avoid duplicates tuples.
| Iteration |Input for the current iteration I_{i} | New facts inferred |
|-----------|---------------------------------------|------------------------------|
| 1 | {} | {(a,b), (b,c), (c,c), (c,d)} |
| 2 | {(a,b), (b,c), (c,c), (c,d)} | {(a,c),(b,c),(b,d),(c,d)} |
| 3 | {(a,c), (b,d)} | {(a,d)} |
| 4 | {(a,d)} | {} |
Source: Datalog and Recursive Query Processing