INNER JOIN on Linked Server Table much slower than Sub-Query
Asked Answered
A

4

6

I came across this very odd situation, and i thought i would throw it up to the crowd to find out the WHY.

I have a query that was joining a table on a linked server:

select a.*, b.phone
from table_a a, 
join remote.table_b b on b.id = a.id
 (lots of data on A, but very few on B)

this query was talking forever (never even found out the actual run time), and that is when I noticed B had no index, so I added it, but that didn't fix the issue. Finally, out of desperation I tried:

select a.*, b.phone
from table_a a, 
join (select id, phone from remote.B) as b on b.id = a.id

This version of the query, in my mind as least, should have the same results, but lo and behold, its responding immediately!

Any ideas why one would hang and the other process quickly? And yes, I did wait to make sure the index had been built before running both.

Amoeboid answered 25/1, 2012 at 20:20 Comment(6)
Have you looked at the execution plan for both? Do they differ? Did you issue a dbcc dropcleanbuffers between each method?Caeoma
Roughly how many records are in remote.B?Clupeid
The "remote" DB is on another server accessed via a linkserver and there are about 600 rows.Amoeboid
Please show both execution plans. Without that these are just guesses.Handley
Are there more columns in table_b than just id and phone?Herbertherbicide
There are other columns in B, however they are not referenced at allAmoeboid
J
4

It's because sometimes(very often) execution plans automatically generated by sql server engine are not as good and obvious as we want to. You can look at execution plan in both situations. I suggest use hint in first query, something like that: INNER MERGE JOIN.

Here is some more information about that:

http://msdn.microsoft.com/en-us/library/ms181714.aspx

Japhetic answered 25/1, 2012 at 20:26 Comment(5)
This is a nice new thing I have learned today! I know some other places I can use this kind of merge.Amoeboid
I'm using merge hint to join view (created from 3 tables) and table. View already has executuion plan but even that engine creates new, much much slower. That is another case when you can find it useful.Japhetic
MERGE isn't a magic "go faster" hint. If you aren't getting the right plan it is better to first understand why.Handley
Of course. I agree with that. I looked at execution plan and I found that he was joining first one big table from view with table not from view and then rest which is completly odd.Japhetic
I agree, I don't think you can use this everywhere either, but in my situation I knew that this would solve the problem as soon as i read what it would do (there were a lot of other tables I was not showing in examples that could change the execution plan greatly.Amoeboid
F
3

For linked servers 2nd variant prefetches all the data locally and do the join, since 1st variant may do inner loop join roundtrip to linked server for every row in A

Feriga answered 25/1, 2012 at 20:26 Comment(5)
@Amoeboid for 1st variant look at Number of executions in Query plan for remote call - if it more than 1- then it does roundtripd for dataFeriga
Checked the estimated execution plan; its no inner loop joins going on for that table.Amoeboid
@Liney - What is the Number of executions for remote table item? Or show us the whole query plan as XMLFeriga
The number of executions for the remote table is 1Amoeboid
@Liney How about XML query plan?Feriga
H
1

Remote table as in not on that server? Is it possible that the join is actually making multiple calls out to the remote table while the subquery is making a single request for a copy of the table data, thus resulting in less time waiting on network?

Herbertherbicide answered 25/1, 2012 at 20:24 Comment(6)
~600 rows in the remote table. Find something to support the idea that the subquery is resulting in the entire table being cached and this gets my +1Clupeid
The estimated execution plan shows it only accessing it once in both versions of the query. The bad one is estimated using 40% of the processing, while the 2nd one is 1%Amoeboid
Limey, what is the exact wording in the execution plan? I would be surprised if it actually indicates the number of calls that were made internal to the process.Herbertherbicide
when using a join, its estimate number of executions is 1Amoeboid
@Amoeboid - What is the join type? Merge? Hash? Nested Loops? If you do a Diff of the XML plans what is different between the two?Handley
The join type was inner for both queries. I have tried merge now, and that does resolve the issue.Amoeboid
Q
1

I'm just going to have a guess here. When you access remote.b is it a table on another server?

If it is, the reason the second query is faster is because, you do one query to the other server and get all the fields you need from b, before processing the data. In the first query you are processing data and at the same time you are making several requests to the other server.

Hope this help you.

Quechuan answered 25/1, 2012 at 20:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.