sql query joins multiple tables - too slow (8 tables)
Asked Answered
M

9

38

i'm trying to join 8 tables into one in order to create index used by other application, my query is like : (my mysql skill's very amateur)

SELECT t1_id, t2_name, t3_name, t4_name, t5_name, 
       t6_name, t7_name, t8_name, t9_name 
FROM t1 
  LEFT JOIN t2 ON (t1_id = t2_id) 
  LEFT JOIN t3 ON (t3_id = t1_id) 
  LEFT JOIN t4 ON (t4_id = t1_id)
  LEFT JOIN t5 ON (t5_id = t1_id)
  LEFT JOIN t6 ON (t6_id = t1_id) 
  LEFT JOIN t7 ON (t7_id = t1_id)
  LEFT JOIN t8 ON (t8_id = t1_id)
  LEFT JOIN t9 ON (t9_id = t1_id)

i can't even see the query results when i executing it, any ways to speed it up? :) any kinds of help is appreciated, but it's better be only one query (outside application rules)

thanks in advance

Mafalda answered 30/4, 2009 at 9:2 Comment(0)
S
63

I had a similar problem with several lookup tables joining to a large table with all id fields indexed. To monitor the effect of the joins on query time execution, I ran my query several times (limiting to first 100 rows), adding a Join to an additional table each time. After joining 12 tables, there was no significant change in query execution time. By the time I had joined the 13th table the execution time jumped to a 1 second; 14th table 4 seconds, 15th table 20 s, 16th 90 seconds.

Keijro's suggestion to use a correlated subqueries instead of joins e.g.

SELECT t1_id, 
        (select t2_name from t2 where t1_id = t2_id), 
        (select t3_name from t3 where t1_id = t3_id), 
        (select t4_name from t4 where t1_id = t4_id), 
        (select t5_name from t5 where t1_id = t5_id), 
        (select t6_name from t6 where t1_id = t6_id), 
        (select t7_name from t7 where t1_id = t7_id), 
        (select t8_name from t8 where t1_id = t8_id), 
        (select t9_name from t9 where t1_id = t9_id)  FROM t1

improved query performance dramatically. In fact the subqueries did not seem to lengthen the time to execute the query (the query was almost instanteous).

I am a little suprised as I thought correlated subqueries perform worse than joins.

Schutz answered 16/6, 2009 at 16:32 Comment(3)
I'd give you much more than +1 for this if I could. I had exactly the same problem and this solution works perfectly. Coming from a MSSQL/Oracle background where perceived wisdom is that correlated subqueries are to be avoided wherever possible this is very counter-intuitive, but it works. Reading around it seems it may be because MySQL handles joins as nested loops.Coinsure
I'm coming from a 15 table joins. Even with indexs the query still take me 1 or 2 seconds from 50 result limit. I'll try to rewrite the big method using your recommandations. thanks !Overhand
"Correlated subqueries" can cause query time to increase dramatically if there is a lot of data (or if doing a lot of aggregation). If you don't need every column from each table being joined you can also look into the WITH statement block to return just the data you need and then join on those table references.Coryphaeus
F
30

Depending on how much data is in the tables, you may need to place indexes on the columns that are being joined against. Often slow querying speed comes down to lack of an index in the right place.

Also:

LEFT JOINs are slower than INNER JOINs (though this is dependent on what you're doing exactly) - can you accomplish what you're looking for with inner joins?

Flywheel answered 30/4, 2009 at 9:14 Comment(3)
Could be good to explain the differences between inner and left joins: inner joins doesn't include rows where the data is not found, where left joins (or outer joins) do. that is, if say t8 lacks an id corresponding to t1_id, then that row will not be included at all (even if there are other tables that have that id)Autocrat
Agree to both points. Use INNER JOINS and make sure there is an index on the fields you are joining.Southern
BUT see answer about using correlated subqueries below. I had a 15 join table (all inner joins for decodes on primary indexes) which runs 1000x faster if the 10 joins that could be were placed into correlated subqueries instead of joinsCoinsure
A
6

It would help a bit if you could post the explain plan of the query.

But, first of all, you have indexes on all the fields used in the join? something like CREATE INDEX ix_t2_id on t2 (t2_id, t2_name);

Instead of the joins you could do something like

SELECT t1_id, 
    (select t2_name from t2 where t1_id = t2_id), 
    (select t3_name from t3 where t1_id = t3_id), 
    (select t4_name from t4 where t1_id = t4_id), 
    (select t5_name from t5 where t1_id = t5_id), 
    (select t6_name from t6 where t1_id = t6_id), 
    (select t7_name from t7 where t1_id = t7_id), 
    (select t8_name from t8 where t1_id = t8_id), 
    (select t9_name from t9 where t1_id = t9_id) 
FROM t1 

But, with a good query planner, that shouldn't differ from the joins.

Autocrat answered 30/4, 2009 at 9:13 Comment(4)
thanks for your answer, here is the explain plan: 1 SIMPLE r ALL NULL NULL NULL NULL 4977 1 SIMPLE m ref release release 4 main.r.r_id 2 1 SIMPLE t ref release release 4 main.r.r_id 2 1 SIMPLE s ALL NULL NULL NULL NULL 4889 1 SIMPLE mu ref release release 4 main.r.r_id 2 1 SIMPLE n ALL NULL NULL NULL NULL 6 1 SIMPLE q ALL NULL NULL NULL NULL 13 1 SIMPLE o ref release release 4 main.r.r_id 2 1 SIMPLE g ref release release 4 main.r.r_id 2Mafalda
Im not sure inline selects are a good route to SQL performance.Truelove
Some books call them Correlated Subqueries. They are selects inside other select statements before the From; they are evaluated row by row so for large datasets they are very inefficient.Truelove
True, they should mostly be avoided, but I have had several occasions where they have made a great contribution to the performance, so in some cases they should be considered.Autocrat
T
5

How much data are we talking about ? It might be you have a lot of data and as the where clause is being run at the end of the query process you are joining huge volumes of data before filtering it.

In that case its better to filter the data as soon as possible so if you can restrict the data from T1 in the first inner select all the other joins will join to a more limited set of data.

Select <your fields> from
(
Select * from t1 where t1_id = t1_value
) t1

Inner join t2
on t1.ID = t2.ID
...

if its not masses of data; check your indexes are correct then check server type things; index fragmentation; disk queues etc.

Truelove answered 30/4, 2009 at 9:31 Comment(1)
Is there even a t1_value to check for? He/she didn't say so in the question.Passerby
G
1

If you need all the rows of t1, and you left join on the primary key (I guess it's also the clustered index) of the other tables, there is no way to improve the speed of the query.

To improve performance you either need to reduce the result set or perform a nasty trick (eg make a denormalized copy of the data).

Gytle answered 30/4, 2009 at 9:41 Comment(0)
B
1

From your query plan I can conclude that the tables referred to as s, n and q do not have an index on the field they are being joined on.

Since there are lot of rows in these tables (about 400,000 rows in their cartesian product) and MySQL's only way to do JOIN's is using NESTED LOOPS, it will really take forever.

Create an index on these tables or define the joined field as a PRIMARY KEY.

Butterfield answered 3/5, 2009 at 21:47 Comment(0)
M
0

As i can see, t1 table is the one which is being joined with all the tables, instead of putting them in a single query with so many joins, you can possibly try a Union of different queries something like this.

SELECT  t1_id, t2_name 
FROM    t1 LEFT JOIN t2 ON (t1_id = t2_id)
union 
SELECT  t1_id, t3_name 
FROM    t1 LEFT JOIN t3 ON (t1_id = t3_id)

however, in that case the result you will get will not have 8 columns but just 1 column. not sure if that is an option available with you.

one more thing, which you must in whatever solution you implement is - create appropriate index on all your tables. the best practice of index columns is to create it on the column which is most frequently used for joins or where clause.

Militiaman answered 30/4, 2009 at 9:18 Comment(0)
B
0

Depending on your version of SQL server, simply putting your query into a stored procedure may make a big difference. Try this after you have tried the other optimizations first.(Yes, I know there are cached execution plans and other internal server optimizations, but in my practical real-world experience, stored procedures can execute faster.)

Bracey answered 16/6, 2009 at 16:57 Comment(1)
If using SQL Server, this does make a difference. Microsoft even documents that stored procedures benefit from saving the execution plan as they are a persistent object versus running an ad-hoc query.Coryphaeus
R
0

Joins can slow down dramatically when the dataset you're working with is so large you exceed working memory when performing the join. Postgres will then save its work to disk as it goes along. This is why you may see a slowdown only after the nth join regardless of which tables you're joining, or whether you have indexes configured properly.

In my case, EXPLAIN only showed a few tens of thousand rows, nothing to write home about. But what I learned is that even if the slowdown happens during a join, the join might not be the problem. The culprit in my case turned out to be a very large uuid[] column that contained a lot of entries. I excluded that one column from my query and it sped everything up.

Ramsay answered 23/12, 2021 at 14:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.