Why does SQL cost explode with simple "or"?
Asked Answered
K

3

7

I have the following statement to find unambiguous names in my data (~1 Million entries):

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where (p1.Surname = p2.Surname OR p1.Surname = p2.Altname) 
   and p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

Oracle shows a huge cost of 1477315000 and execution does not end after 5 minutes. Simply splitting the OR into an own exists subclause boosts performance to 0,5 s and costs to 45000:

select Prename, Surname from person p1 
where Prename is not null and Surname is not null 
and not exists (
   select * from person p2 where p1.Surname = p2.Surname and
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and not exists (
   select * from person p2 where p1.Surname = p2.Altname and 
   p2.Prename LIKE CONCAT(CONCAT('%', p1.Prename), '%') and p2.id <> p1.id
) and inv_date IS NULL

It's not my question to tweak this to the best, as it is only a seldomly executed query, and I know CONTACT is surpassing any index, but I just wonder where this high cost comes from. Both queries seem semantically equivalent to me.

Kingsley answered 23/5, 2011 at 14:29 Comment(0)
F
6

The answer is in the EXPLAIN PLAN for your queries. They may semantically be equivalent but the execution plan behind the scenes for your queries are vastly different.

EXISTS operates differently from a JOIN and essentially, your OR filter statement is what joins the tables together.

No JOIN occurs in the second query as you are only retrieving records from one table.

Fidelity answered 23/5, 2011 at 14:38 Comment(3)
+1 - To elaborate, EXISTS short circuits and OR does not (at least in SQL Server, I'm assuming Oracle is similar). By including the OR in the EXISTS sub it checks both options each time. Separating means it only checks the second if the first is false.Canal
+1 - Execution plan1: Filter Not Exists (...) 1477315000 | Table access Person by Index ROWID 13863 | Table access Person by Index ROWID 4019; Plan 2 is huge and uses two hash joinsKingsley
Accepted asnwer. It seems I overestimated Oracle's query analyzerKingsley
E
2

The results of your two queries may be semantically equivalent, but the execution is not operationally equivalent. Your second example never makes use of an OR operator to combine predicates. All of your predicates in the second example are combined using an AND.

The performance is better because, if the first predicate that is combined with an AND does not evaluate to true then the second (or any other predicate) is skipped, (not evaluated). If you used an OR then both (or all) predicates would have to be evaluated frequently thus slowing down your query. (ORed predicates are checked until one evaluates to true.)

Emulate answered 23/5, 2011 at 14:42 Comment(2)
With semantically equivalent I meant producing the same result set, which I sill think they do...Kingsley
@stacktracer: good point. i'll modify my answer with something like 'operationally equivalent'. Though I would not assume semantic equivalency on differing queries. But I think you're not only faster but safer with your 2nd example by omitting OR. ORs can play havoc with results.Emulate
P
1

I would consider testing the query rewritten as below... Do a direct join from one to the other on the criteria that "Qualifies" what IS considered a match... Then, in the WHERE clause, throw it out if it doesn't come up with a match

select 
      p1.Prename, 
      p1.Surname
   from 
      person p1 
         join person p2
            on p1.ID <> p2.ID
            and (  p1.Surname = p2.Surname
                or p1.SurName = p2.AltName )
            and p2.PreName like concat( concat( '%', p1.Prename ), '%' )
   where
          p1.PreName is not null
      and p1.SurName is not null
      and p1.Inv_date is null
      and p2.id is null

Per your comments, but from what it appears you were looking for... NO, do NOT do a left outer join... If you are looking for names that are ALIKE that you want to PURGE out (however you'll handle that), you only want to PREQUALIFY those records that DO HAVE A MATCH via the self-join (hence normal join). If you have a name that DOES NOT have a similar name, you probably want to leave it alone... thus it will automatically be left OUT of the result set.

Now, the WHERE clause kicks in... You have a valid person on the left... that HAS a person on the right.. These ARE the duplicates... so you have the match, now by throwing in the logical "p2.ID IS NULL" creates the same result as NOT EXIST giving the final results.

I put my query back to a normal "join".

Purehearted answered 23/5, 2011 at 14:53 Comment(5)
Doesn't this give me the ambiguous names?Kingsley
I corrected the query to reflect that you mean a LEFT OUTER JOIN, not a JOIN. Using a JOIN will return probably no results if no id is null.Takashi
@stracktracer: using LEFT JOIN b WHERE b.id IS NULL is a clever way of doing a NOT EXISTS.Takashi
@stracktracstracktracer, @Benoit, see revised comments going back to a NORMAL join...Purehearted
Normal join gives no results. Left outer join seems to work but duration is 9 seconds and cost 195377045.Kingsley

© 2022 - 2024 — McMap. All rights reserved.