Say I have a very long table (~35 million rows) called TimeCard with only 5 columns (tableID, CompanyID, UserID, ProjectID, DailyHoursWorked, entryDate). This is a pretty straight forward table that records employees' worked hours per day per project per company.
I now need to generate a report to find out the employees' total worked hours per month per project for any given company. Instead of performing the aggregation needed when the report runs, I want to build a table-like data structure that already have all the Company/Project/User data aggregated by month, so when the report runs, I can just query that data structure directly without performing any run-time aggregation since ~35million records can take a few mins.
So I have 2 different ways. One create an extra physical table with (CompanyID, UserID, ProjectID, MonthlyHoursWorked, Month) as my columns and just use trigger at the TimeCard table to modify the values at the extra table. Or I can create an Indexed View. So I tried both. I first tried the indexed view with the following code:
CREATE VIEW [dbo].[vw_myView] WITH SCHEMABINDING AS
SELECT
JobID,
ProjectID,
Sum(DailyHoursWorked) AS MonthTotal,
DATEADD( Month, DATEDIFF( Month, 0, entryDate), 0 ) AS entryMonth,
CompanyID,
COUNT_BIG(*) AS Counter
FROM
dbo.TimeCard
Group By DATEADD( Month, DATEDIFF( Month, 0, entryDate ), 0 ), JobID, ProjectID, CompanyID
Go
CREATE UNIQUE CLUSTERED INDEX [IX_someIndex] ON [dbo].[vw_myView]
(
[CompanyID] ASC,
[entryMonth] ASC,
[UserID] ASC,
[ProjectID] ASC
)
The indexed view created correctly and totaling with ~5 million rows total.
However, every time if I clear the SQL cache, and run the following query: *select * from vw_myView where companyID = 1*, it takes almost 3 minutes. If I go with the extra table route as I mentioned above, with my cache cleared, it takes around 4 seconds.
My questions are, is Indexed View a bad choice for this particular scenario? In particular I am interested to know if the entire indexed view gets re-calculated/re-aggregated every time when the underlying table (TimeCard) is changed or when a query is run against it?
Thanks!
entryMonth
, couldn't you just haveMONTH(entryDate)
and possiblyYEAR(entryDate)
as INTs ? Seems a lot easier to me (but then again - I don't know your exact requirements).... – Qualm