SQL optimization and Disjunctive Normal Form
Asked Answered
P

2

7

So I was writing a query in Visual Studio 2010 (by which I mean I opened the server explorer, right clicked the server and chose New Query). The query includes the condition

A AND B AND C AND D AND E AND F AND (G OR H)

which is conjunctive normal form (CNF). When I ran the query(attached to MSSQL Server 2008), it changed the text into

A AND B AND C AND D AND E AND F AND G OR
A AND B AND C AND D AND E AND F AND H

which is disjunctive normal form (DNF).

From the little I found on-line, it seems like DNF allows SQL to run the conjunctives separately and union them at the end.

However, for something like this, with so many repeated conditions, does DNF actually provide an advantage over CNF? If it doesn't, how can I force the optimizer to take the condition as is? If it does, should I write the query in my application code in CNF form because it's shorter and neater or in DNF form because it saves time for the optimizer?

Permafrost answered 25/7, 2011 at 16:35 Comment(5)
Where did it change the query text? Can you post a screenshot?Rosen
Unfortunately, no easy way to post a screenshot (the computers at work are weird). As for the first question, I was in a query tab in VS2010 (where there are four panes, Diagram, Criteria, SQL, Results) and it overwrote the SQL I had written with the new text.Permafrost
I never looked into any concrete implementations of SQL engines, other than my own. ^^ So I don't know if the actual query execution will be accelerated through DNF. What I CAN tell is that DNF is at least an interesting approach for caching. The SQL engine could transform the query to DNF to be able to check if the query has been executed before. And if so, maybe query translations from SQL to native representations or even query results could be delivered from cache. But I agree with Clockwork-Muse and would leave that up to the SQL engine and write readable query myself.Cancellation
Visual studio 2012 Query Builder (when designing a TableAdapter) converted my SQL where clause to normal form. Is there a setting to disable this behavior?Haydenhaydn
Well, i'm perplexed... I never heard of DNF. I also never knew that SQLServer would interpret OR's differently if there were linebreaks... I would never make sense of a WHERE-clause like A AND B AND C AND D AND E AND F AND G OR A AND B AND C AND D AND E AND F AND HAdelina
D
2

I don't know about the relative advantages of DNF/CNF in this situation, or even how to force the optimizer in this fashion.

Generally speaking, you don't want to force the optimizer to take your 'perceived', 'current', optimization over the one it will generate (there are exceptions to this, but these are usually rare). This largely has to do with the fact that the 'best' optimization may change over time, as a side effect of other actions (like adding an index). If you're forcing the optimizer to adopt a particular optimization, you're locking it into that path, even if a new one may perform better.

Given that, you should write the query in the form that is easiest to read and maintain (CNF), and let the optimizer change it if necessary - this is the whole point of SQL being a declarative language, to allow the optimizer to muck with things as necessary.

Discomfort answered 25/7, 2011 at 17:1 Comment(0)
S
0

Off the top of my head, I wonder about the indexing on G or H. If G was indexed, but H wasn't...maybe a disjunctive would make more sense.

anyway, you can run the perfmance analyzer yourself to see the net difference in performance.

Besides that, here is some research you might be able to get access to if you want to dive-in: Reserach Material: http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=842265&abstractAccess=no&userType=inst

Spiderwort answered 25/7, 2011 at 17:18 Comment(1)
In this case, G and H actually refer to the same column; G is "col like 'word1%'" and H is "col like 'word2%'", so I can't use IN or BETWEEN. But I like the thought.Permafrost

© 2022 - 2024 — McMap. All rights reserved.