slow query performance issue with partition and max
Asked Answered
G

4

6

this a poor performancing query I have ... what have I done so wrong? Please help me it is executed tons of times in my system, solving that will give me a ladder to heaven

I gave a check on the system with sp_Blitz and no mortal issues found

Here is the query :

SELECT MAX(F.id) OVER (PARTITION BY idstato ORDER BY F.id DESC) AS id
FROM jfel_tagxml_invoicedigi F
     INNER JOIN jfel_invoice_state S ON F.id = S.idinvoice
WHERE S.idstato = @idstato
  AND S.id = F.idstatocorrente
  AND F.sequence_invoice % @number_service_installed = @idServizio
ORDER BY F.id DESC,
         F.idstatocorrente OFFSET 0 ROWS FETCH NEXT 1 ROWS ONLY;

Here is the query plan

https://www.brentozar.com/pastetheplan/?id=SyYL5JOeE

I can send you privately my system properties

update: Made some modification , it is better , but I think it could be better ... here is the new query :

SELECT MAX(F.id) AS id
FROM jfel_tagxml_invoicedigi F
     INNER JOIN jfel_invoice_state S ON F.id = S.idinvoice
WHERE S.idstato = @idstato
  AND S.id = F.idstatocorrente
  AND F.sequence_invoice % @number_service_installed = @idServizio;

And the new plan: https://www.brentozar.com/pastetheplan/?id=SJ-5GDqeE

update: Made some modification , it is better , but I think it could be better ... here is the new query :

SELECT top 1 F.id as id 
FROM jfel_tagxml_invoicedigi AS F 
INNER JOIN jfel_invoice_state AS S 
ON F.idstatocorrente = S.id
WHERE S.idstato= 1 AND S.id = F.idstatocorrente 
and S.datastato > dateadd(DAY,-5,getdate())
AND F.progressivo_fattura % 1 = 0
ORDER BY S.datastato

And the new new plan https://www.brentozar.com/pastetheplan/?id=S1xRkL51S

Gangrel answered 21/12, 2018 at 12:1 Comment(11)
Considering this is only returning 1 row, why not just SELECT MAX(F.id) and bin the ORDER BY and OFFSET? A MAX of a MAX is still the MAX; I see no reason why you need to calculate the MAX value by idstato and then get the MAX value of those.Pincenez
Maybe it would help, if you explain what you want to get with your query and the table definitions.. Maybe someone will have another idea how to solve your problem.Danseuse
I want the last row of table jfel_tagxml_invoicedigi given a particular @idstato (which, by the way, is always a fixed number ) , I will try to eliminate OFFSET and orderSacramentalist
@Larnu I want the last row of table jfel_tagxml_invoicedigi given a particular idstato (which, by the way, is always a fixed number ) , I will try to eliminate OFFSET and order I changed like this: SELECT max(F.id) as id FROM jfel_tagxml_invoicedigi AS F INNER JOIN jfel_invoice_state AS S ON F.id = S.idinvoice WHERE S.idstato= idstato AND S.id = F.idstatocorrente AND F.sequence_invoice % number_service_installed = idServizioSacramentalist
@Dan Stef I want the last row of table jfel_tagxml_invoicedigi given a particular @ idstato (which, by the way, is always a fixed number , just one rowSacramentalist
Surely "AND F.progressivo_fattura % 1 = 0" is redundant as it will always be true, therefore you should remove this as this impacts the planHushhush
Sure, but I just changed two variables, so in some case, they can give falseSacramentalist
It would help if you provided table & index details and some sample data.Hushhush
Yes Steve, sure, but privately, can we continue this conversation in a private room?Sacramentalist
Would you mind trying different join options (Merge and Hash)? learn.microsoft.com/en-us/sql/t-sql/queries/… In the last execution plan I can see the Nested Loop join which is not the best strategy for joining datasets. Merge and Hash joins are most efficient when proper indexing and filtering is used (not always appliable for specific filtering queries, but could help in most scenarios to achieve significant perfomance gains).Hulbard
thanks, tried both but unfortunately passed from CPU time = 375 ms, elapsed time = 594 ms. to CPU time = 4907 ms, elapsed time = 824 ms.Sacramentalist
H
2

The data calculation is Non-Sargable, you could try using a variable with OPTION RECOMPILE:

DECLARE @d Date
SET @d = dateadd(DAY,-5,getdate())

SELECT top 1 F.id as id 
FROM jfel_tagxml_invoicedigi AS F 
INNER JOIN jfel_invoice_state AS S 
ON F.idstatocorrente = S.id
WHERE S.idstato= 1 AND S.id = F.idstatocorrente 
and S.datastato > @d
AND F.progressivo_fattura % 1 = 0
ORDER BY S.datastato
OPTION (RECOMPILE)
Hushhush answered 25/6, 2019 at 11:25 Comment(1)
great suggestion, but I had no luck in time and disk performanceSacramentalist
A
3

Filtering by calculated fields used to affect performance negatively. You can do your other filters first, and as a last step do the calculated filter, to have less rows to match. Maybe it will fill TEMPDB because it will store the intermediate recordset there, but in this case you either increase the size of it, or use another method.
Here is your second query written like this (maybe you need to adjust it, I just wrote it in Notepad++:

SELECT MAX(id) AS id
FROM (
    SELECT F.id, F.sequence_invoice % @number_service_installed as [idServizio]
    FROM jfel_tagxml_invoicedigi F
         INNER JOIN jfel_invoice_state S ON F.id = S.idinvoice
    WHERE S.idstato = @idstato
        AND S.id = F.idstatocorrente
        -- AND F.sequence_invoice % @number_service_installed = @idServizio
)
WHERE idServizio = @idServizio
;

Instead of the subquery, you can try a temp table or CTE as well, maybe one is the clear winner above the others, worth a try for all if you want maximum performance.

Alight answered 21/6, 2019 at 13:53 Comment(9)
seem the right choice, but plus the performance I want to take care also about disk io , I 'm trying to figure how to write your query, but it gives me error ... I'll let you know the error is Operand type clash: uniqueidentifier is incompatible with intSacramentalist
uniqueidentifier can contain characters, so it can not be stored as int. Regarding disk io, you can compare with other solutions, they might also be high on it.Preconscious
I did as you suggested , but I had no significant improvement ... here is the new plan brentozar.com/pastetheplan/?id=SJn6ZOC1HSacramentalist
Check for equality only (=). SELECT the necessary fields and do some calculation in the SELECT if necessary, and put these filters in the outer query. Maybe you need another level of nesting, filter the date secondly, filter the idServizio thirdly.Preconscious
Ahhh... I can not edit my comment any more. The first sentence above should contain "in the inner query".Preconscious
I just re-checked, and the date filter was added only later. Why is that? What happens, if you remove this filter completely? If it is required, maybe that should be the last step.Preconscious
Can you write the query? please, I think I should understand more clearly what you meant, altrought filtering the date reduces the data significantly and maybe should be the first oneSacramentalist
Let us continue this discussion in chat.Sacramentalist
Sorry, I don't have the time for chat. I don't need to write a query that I'm talking about, that is in my answer, without the date. If it gives you an incorrect result, then add the date in the outer query. Irrespective of that, if it is still slow, check if the inner query is slow. If yes, an index should cover that query, with the columns in the exact same order in the index as in the WHERE clause. That internal query must run incredibly fastly, so that other conditions that can be implemented more slowly get less rows to process.Preconscious
S
2

I think you need a NONCLUSTERED INDEX for your query that you describes above.

If you don't have any idea about INDEX, I mean you can not identify a witch field of your table NONCLUSTERED INDEXneed then simply, you just create an execution plan from SQL Server 2008 Management Studio and SQL Server intelligence gives you missing index details and shows a green color text that is details of the missing index.

you can move your mouse pointer on missing Index text and SQL Server 2008 Management Studio intelligence will show the T-SQL code that is required to create the missing index or you can press your mouse to right-click on missing index text then select the missing index details option from the list to see the details of the missing index.

For more information, you can visit this article Create Missing Index From the Actual Execution Plan

I hope this solution helps you.

Skippie answered 21/6, 2019 at 11:20 Comment(6)
It's the first thing I did, no missing index, thanks anyway.Sacramentalist
Gabriele D'Onufrio, still can you please try and create NONCLUSTERED INDEX on your table for columns that you used in your where condition? As per my experience, there is an 80% query that we can improve performance just using write the right INDEX.Skippie
Already did, in the update (update: Made some modification , it is better , but I think it could be better ... here is the new query) , the query has only NONCLUSTERED INDEXES , but it isn't enough fast. If you want I can give the exact execution plan ... privatelySacramentalist
Yes please if you can give generate execution plan generated from managment studio then we can get more idea.Skippie
here is the new excution plan brentozar.com/pastetheplan/?id=SJn6ZOC1HSacramentalist
Hi please, can u create NONCLUSTERED INDEX for the table jfel_fatture_stati on column id and datastato and also create an index for the table jfel_tagxml_fatturaelettronica on column id and re-generate your execution plan again.Skippie
H
2

The data calculation is Non-Sargable, you could try using a variable with OPTION RECOMPILE:

DECLARE @d Date
SET @d = dateadd(DAY,-5,getdate())

SELECT top 1 F.id as id 
FROM jfel_tagxml_invoicedigi AS F 
INNER JOIN jfel_invoice_state AS S 
ON F.idstatocorrente = S.id
WHERE S.idstato= 1 AND S.id = F.idstatocorrente 
and S.datastato > @d
AND F.progressivo_fattura % 1 = 0
ORDER BY S.datastato
OPTION (RECOMPILE)
Hushhush answered 25/6, 2019 at 11:25 Comment(1)
great suggestion, but I had no luck in time and disk performanceSacramentalist
S
1

All Window Aggregation has a very big performance penalty. Try to take this window sliding mechanism outside the database (i.e. in your application RAM) will be the universal way of optimizing it.

Otherwise, you may try to give more RAM to each database section (in PostgreSQL, you can tweak this via a parameter. In other database, you may or may not able to).

The main reason why it is taking very long (slow) is that it invokes sorting and materializing of the sorted table.

Sahara answered 28/6, 2019 at 4:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.