SELECT clause using IN ... very slow?
Asked Answered
E

3

5

Could you guys please review the following query to an Oracle DB and point out what's wrong:

SELECT t1.name FROM t1, t2 WHERE t1.id = t2.id AND t2.empno IN (1, 2, 3, …, 200)

Query statistics:

  • Time taken: 10.53 seconds.

Indices:

  • t2.empno is indexed.

  • t1.id is indexed.

  • t2.id is indexed.

Update


The above query was just a sample replica of the query i use. Here below in a more true form

Explain Plan Explain Plan

Query:

SELECT 
    PRODUCT_REPRESENTATION_SK 
FROM 
    Product_Representation pr 
    , Design_Object do
    , Files files  
    ,EPS_STATUS epsStatus 
    ,EPS_ERROR_CODES epsError
    ,VIEW_TYPE viewTable  
WHERE  
    pr.DESIGN_OBJECT_SK = do.DESIGN_OBJECT_SK 
    AND  pr.LAYER_NAME !='Layer 0' 
    AND  epsStatus.EPS_STATUS_SK = pr.EPS_STATUS  
    AND epsError.EPS_ERROR_CODE = pr.EPS_ERROR_CODE 
    AND viewTable.VIEW_TYPE_ID = pr.VIEW_TYPE_ID 
    AND  files.pim_id = do.PIM_ID 
    AND  do.DESIGN_OBJECT_ID IN
        (
147086,149924,140458,135068,145197,134774,141837,138568,141731,138772,143769,141739,149113,148809,141072,141732,143974,147076,143972,141078,141925,134643,139701,141729,147078,139120,137097,147072,138261,149700,149701,139127,147070,149702,136766,146829,135762,140155,148459,138061,138762.............................................  200 such numbers
        )

Indexed Colums:

pr.DESIGN_OBJECT_SK
do.DESIGN_OBJECT_SK
do.DESIGN_OBJECT_ID
files.pim_id

Table


TABLE "PIM"."DESIGN_OBJECT" 
(   
"DESIGN_OBJECT_SK" NUMBER(*,0) NOT NULL ENABLE, 
"PIM_ID" NUMBER(*,0) NOT NULL ENABLE, 
"DESIGN_OBJECT_TYPE_SK" NUMBER(*,0) NOT NULL ENABLE, 
"DESIGN_OBJECT_ID" VARCHAR2(40 BYTE) NOT NULL ENABLE, 
"DIVISION_CD" NUMBER(*,0), 
"STAT_IND" NUMBER(*,0) NOT NULL ENABLE, 
"STAT_CHNG_TMST" TIMESTAMP (6), 
"CRTD_BY" VARCHAR2(45 BYTE), 
"CRT_TMST" TIMESTAMP (6), 
"MDFD_BY" VARCHAR2(45 BYTE), 
"CHNG_TMST" TIMESTAMP (6), 
"UPDATE_CNT" NUMBER(*,0), 
"GENDER" VARCHAR2(1 BYTE), 

 PRIMARY KEY ("DESIGN_OBJECT_SK")
)
TABLESPACE "PIM"  ENABLE, 

FOREIGN KEY ("DESIGN_OBJECT_TYPE_SK")
    REFERENCES "PIM"."DESIGN_OBJECT_TYPE" ("DESIGN_OBJECT_TYPE_SK")
        ON DELETE CASCADE ENABLE, 

FOREIGN KEY ("PIM_ID")
    REFERENCES "PIM"."FILES" ("PIM_ID")
        ON DELETE CASCADE ENABLE

)

Table 2


CREATE TABLE "PIM"."PRODUCT_REPRESENTATION" 
(
"PRODUCT_REPRESENTATION_SK" NUMBER(*,0) NOT NULL ENABLE, 
"DESIGN_OBJECT_SK" NUMBER(*,0) NOT NULL ENABLE, 
"VIEW_TYPE_ID" NUMBER(*,0) NOT NULL ENABLE, 
"LAYER_NAME" VARCHAR2(255 BYTE), 
"STAT_IND" NUMBER(*,0) NOT NULL ENABLE, 
"STAT_CHNG_TMST" TIMESTAMP (6), 
"CRTD_BY" VARCHAR2(45 BYTE), 
"CRT_TMST" TIMESTAMP (6), 
"MDFD_BY" VARCHAR2(45 BYTE), 
"CHNG_TMST" TIMESTAMP (6), 
"UPDATE_CNT" NUMBER(*,0), 
"EPS_STATUS" VARCHAR2(30 BYTE) NOT NULL ENABLE, 
"EPS_GENERATED_TIME" TIMESTAMP (6), 
"EPS_ERROR_CODE" NUMBER, 
"EPS_ERROR_DETAILS" VARCHAR2(500 BYTE), 
"DEEPSERVER_ASSET_LAYER_ID" VARCHAR2(255 BYTE), 
"PRODUCT_REPRESENTATION_LOC" VARCHAR2(255 BYTE), 

 PRIMARY KEY ("PRODUCT_REPRESENTATION_SK")
)
TABLESPACE "PIM"  ENABLE, 

FOREIGN KEY ("DESIGN_OBJECT_SK")
    REFERENCES "PIM"."DESIGN_OBJECT" ("DESIGN_OBJECT_SK") 
        ON DELETE CASCADE ENABLE, 
FOREIGN KEY ("VIEW_TYPE_ID")
    REFERENCES "PIM"."VIEW_TYPE" ("VIEW_TYPE_ID")
        ON DELETE CASCADE ENABLE, 

CONSTRAINT "EPS_ERROR_CODE_FK"
FOREIGN KEY ("EPS_ERROR_CODE")
     REFERENCES "PIM"."EPS_ERROR_CODES" ("EPS_ERROR_CODE") 
        ON DELETE CASCADE ENABLE, 
CONSTRAINT "EPS_STATUS_FK" 
FOREIGN KEY ("EPS_STATUS")
    REFERENCES "PIM"."EPS_STATUS" ("EPS_STATUS_SK") 
        ON DELETE CASCADE ENABLE
) 
Exsect answered 11/8, 2011 at 15:19 Comment(11)
What database are you using? SQL Server?Forint
any execution plan? seems like the parse may be expensive, not so much the query part. (plus you have an extra comma - which i assume is a paste problem )Bleach
also, maybe just an artifact of your example.. but you should use <200 and not that big IN string...Bleach
t2.empno in (1,2,3,........,200) != t2.empno > 0 and t2.empno < 201 ?Heavenly
@Sudhakar: Is that your actual query? Then you could have t2.empno BETWEEN 1 AND 200 or the similar ones that @Bleach and @Heavenly suggested.Molluscoid
In general, the SQL engine has to read through the entire index when using an IN clause. BETWEEN is faster.League
@Sudhakar: Can you provide the execution plan? And the tables' definitions? There may some other slowing factor. For example, t2.empno and t2.id may be indexed but the query could benefit perhaps from a id, empno or an empno, id index.Molluscoid
I have to assume his (1,2,3,...,200) isn't actually as simple in practice as his example here. You are all certainly correct that if he is just interested in BETWEEN 1 AND 200 that he should use that... but in practice I would assume he uses non-sequential and incomplete sets for the real query.... assuming he's looking for particular rows by a set of ID'sLeporine
@Mathew PK : yes indeed . the IN value are not sequentialExsect
Sudhakar: small question; Why do.DESIGN_OBJECT_ID is a VARCHAR if you store numbers there?Molluscoid
More questions: do you have indices on the other fields that are used in the (hidden) JOINs? Did you try to rewrite the query using JOIN instead of WHERE? Did you try my EXISTS approach?Molluscoid
G
8

Lets forget for a moment the empno BETWEEN 1 and 200 suggestion and assume that you have you have t2.empno IN (3,7,...,5209) (200 entries).

You could also write your query (which is a hidden JOIN query) to the non-equivalent EXISTS query which would show same results (but possibly fewer rows) and should be faster than the JOIN:

SELECT
    t1.name
FROM
    t1
WHERE EXISTS
      ( SELECT *
        FROM t2
        WHERE t2.id = t1.id
          AND t2.empno IN (3,7,...,5209)
      )

(Wild speculation)

If on the other hand, it's not even t2.empno IN (3,7,...,5209) but t2.empno IN (SELECT tx.empno FROM tx WHERE someConditions) and you are using MySQL, then this is the root of your problem (MySQL is known to not handle field IN (SELECT f FROM x) in the best possible way). So, you could change the query into:

SELECT
    t1.name
FROM
    t1
  JOIN t2
    ON t2.id = t2.id
  JOIN tx
    ON tx.empno = t2.empno
WHERE
    someConditions

or even to:

SELECT
    t1.name
FROM
    t1
WHERE EXISTS
      ( SELECT *
        FROM t2
          JOIN tx
            ON tx.empno = t2.empno
        WHERE t2.id = t1.id
          AND someConditions
      )
Genus answered 11/8, 2011 at 15:50 Comment(1)
+1 an excellent point about MySQL and its handling of IN(subquery)Leporine
F
7

The first thing that is wrong is using implict join syntax. That is a SQL antipattern.

If you have a large list in the IN clause, have you tried putting them in a table instead and using a join?

What database? Have you looked at your explain plan or execution plan to see where the slowdown is?

Federicofedirko answered 11/8, 2011 at 15:23 Comment(1)
+1 putting the IN (..) in a table may improve performance depending on the size of all the tables here... Or, if you have SQL2008 you can send them in a TVPLeporine
L
3

Don't use the cross-join.

try this

SELECT
    t1.name
FROM
    t1
JOIN t2
    ON t2.id = t1.id
WHERE
    t2.empno IN (1,...,200)

EDIT: After you edit, seeing your multiple tables in the cartesian products, it is probably very important that you use proper JOIN syntax.

Leporine answered 11/8, 2011 at 15:23 Comment(6)
This is a valid answer to the question. Downvoter, please provide feedback.Stendhal
I have not downvoted as this is a good suggestion. But it is not an answer for the slow performance. For such a simple query, either way to write the query should result in the same execution plan.Molluscoid
@ypercube you may be right but it's still speculation on his RDBMS and his actual tables (rather than his, I assume, over-simplified example here). Nonetheless, using a proper JOIN may help his RDBMS plan the query and increase speed more than scanning 200 ints. Using the JOIN is certainly proper practice and may yield the results he can deal with.Leporine
I agree, JOIN is proper practice.Molluscoid
the thing i see is that t2.id = t2.id is obviously incorrect, and the syntax here is equivalent otherwise... not much of an answer.Bleach
@Bleach thanks for pointing out the typo, edited! This syntax may produce an equivalent query plan... but it's still proper syntax and may improve perf.Leporine

© 2022 - 2024 — McMap. All rights reserved.