I have a data load scenario where I create dynamic sql query to pull data and cache in our service. There is 1 table that contains all product data : ProductHistory (47 columns, 200,000 records + and will keep growing)
What I need: Get the latest products by using the maximum id, maximum version and maximum changeid.
First Attempt:
SELECT distinct Product.* FROM ProductHistory product
WHERE product.version =
(SELECT max(version) from ProductHistory p2 where product.Id = p2.Id
and product.changeId =
(SELECT max(changeid) from ProductHistory p3 where p2.changeId = p3.changeId))
This took more than 2.51 minutes.
Other Failed Attempt:
select distinct product.* from ProductHistory product
where CAST(CAST(id as nvarchar)+'0'+CAST(Version as nvarchar)+'0'+CAST(changeid as nvarchar) as decimal) =
(select MAX(CAST(CAST(id as nvarchar)+'0'+CAST(Version as nvarchar)+'0'+CAST(changeid as nvarchar) as decimal)) from ProductHistory p2
where product.Id = p2.Id)
It basically uses the same principle as when you order dates, concatenating the numbers ordered by relevance.
For example 11 Jun 2007 = 20070711
And in our case: Id = 4 , version = 127, changeid = 32 => 40127032
The zeros are there not to mix up the 3 different ids
But this one takes 3.10 minutes !!! :(
So, I basically need a way to make my first attempt query better by any chance. I was also wondering with such amount of data, is this the best speed of retrieval that I should expect ?
I ran sp_helpindex ProductHistory and found out the indexes as below :
PK_ProductHistoryNew - clustered, unique, primary key located on PRIMARY- Id, Version
I wrapped the first query in a SP but still no change.
So, wondering by what other means we can improve the performance of this operation ?
Thanks, Mani p.s : I am just running these queries in SQL management stuido to see the time.