Performance of nested select
Asked Answered
D

3

8

I know this is a common question and I have read several other posts and papers but I could not find one that takes into account indexed fields and the volume of records that both queries could return.

My question is simple really. Which of the two is recommended here written in an SQL-like syntax (in terms of performance).

First query:

Select *
from someTable s
where s.someTable_id in
                    (Select someTable_id 
                     from otherTable o
                     where o.indexedField = 123)

Second query:

Select *
from someTable
where someTable_id in
                  (Select someTable_id 
                   from otherTable o
                   where o.someIndexedField = s.someIndexedField
                   and o.anotherIndexedField = 123)

My understanding is that the second query will query the database for every tuple that the outer query will return where the first query will evaluate the inner select first and then apply the filter to the outer query.

Now the second query may query the database superfast considering that the someIndexedField field is indexed but say that we have thousands or millions of records wouldn't it be faster to use the first query?

Note: In an Oracle database.

Divider answered 4/7, 2013 at 19:51 Comment(3)
. . In general, a performance question without specifying the database is meaningless. SQL is a descriptive language, not a procedural language, so the optimizer (part of the engine) is free to choose whatever query plan is best for a given query.Jahdal
@GordonLinoff Good point. The database is an Oracle database. The language should not really matter I suppose just wrote it in an SQL like syntax.Divider
. . According to Tom Kyte, the Oracle optimizer is smart enough to recognize correlated subqueries and can turn them into the appropriate joins (asktom.oracle.com/pls/apex/…). Oracle has a very good optimizer. What is amusing is that the first version has totally aweful performance in MySQL . . . until version 5.6 which fixed the problem.Jahdal
S
5

In MySQL, if nested selects are over the same table, the execution time of the query can be hell.

A good way to improve the performance in MySQL is create a temporary table for the nested select and apply the main select against this table.

For example:

Select *
from someTable s1
where s1.someTable_id in
                    (Select someTable_id 
                     from someTable s2
                     where s2.Field = 123);

Can have a better performance with:

create temporary table 'temp_table' as (
  Select someTable_id 
  from someTable s2
  where s2.Field = 123
);

Select *
from someTable s1
where s1.someTable_id in
                    (Select someTable_id 
                     from tempTable s2);

I'm not sure about performance for a large amount of data.

Spalla answered 11/5, 2014 at 9:34 Comment(0)
C
2

About first query:

first query will evaluate the inner select first and then apply the filter to the outer query.

That not so simple.

In SQL is mostly NOT possible to tell what will be executed first and what will be executed later.

Because SQL - declarative language.

Your "nested selects" - are only visually, not technically.

Example 1 - in "someTable" you have 10 rows, in "otherTable" - 10000 rows.

In most cases database optimizer will read "someTable" first and than check otherTable to have match. For that it may, or may not use indexes depending on situation, my filling in that case - it will use "indexedField" index.

Example 2 - in "someTable" you have 10000 rows, in "otherTable" - 10 rows.

In most cases database optimizer will read all rows from "otherTable" in memory, filter them by 123, and than will find a match in someTable PK(someTable_id) index. As result - no indexes will be used from "otherTable".

About second query:

It completely different from first. So, I don't know how compare them:

  • First query link two tables by one pair: s.someTable_id = o.someTable_id
  • Second query link two tables by two pairs: s.someTable_id = o.someTable_id AND o.someIndexedField = s.someIndexedField.

Common practice to link two tables - is your first query. But, o.someTable_id should be indexed.

So common rules are:

  • all PK - should be indexed (they indexed by default)
  • all columns for filtering (like used in WHERE part) should be indexed
  • all columns used to provide match between tables (including IN, JOIN, etc) - is also filtering, so - should be indexed.
  • DB Engine will self choose the best order operations (or in parallel). In most cases you can not determine this.
  • Use Oracle EXPLAIN PLAN (similar exists for most DBs) to compare execution plans of different queries on real data.
Credo answered 4/7, 2013 at 20:57 Comment(0)
A
0

When i used directly

where not exists (select VAL_ID FROM @newVals = OLDPAR.VAL_ID) it was cost 20sec. When I added the temp table it costs 0sec. I don't understand why. Just imagine as c++ developer that internally there loop by values)

-- Temp table for IDX give me big speedup
  declare @newValID table (VAL_ID int INDEX IX1 CLUSTERED);
    insert into @newValID select VAL_ID  FROM @newVals

  insert into @deleteValues
    select OLDPAR.VAL_ID
    from @oldVal AS OLDPAR
    where 
        not exists (select VAL_ID from @newValID where VAL_ID=OLDPAR.VAL_ID) 
    or      exists (select VAL_ID from @VaIdInternals where VAL_ID=OLDPAR.VAL_ID);
Arthrospore answered 18/3, 2022 at 9:43 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.