Denormalized access: views or materialized tables?

Asked 8/8, 2014 at 20:37 Answered 8/8, 2014 at 21:58

I'm looking to create denormalized access to data essentially for reporting purposes (and thus to avoid joins and gain performance). I have two solutions in mind, but I'm looking for (a) other potential solutions, and (b) what tradeoffs I should be considering. I'm using SQL Server 2008 R2.

In one solution, I could create an indexed view over the query which does that joins that I care about. My understanding is that this does materialize under the hood but is tricky and might not guarantee good performance (and there's a vociferous debate about the performance of views).

In another solution, I could build the machinery to create a table, populate it with the data I care about, and in a transaction swap it out for the existing table.

The former seems risky/magical to me; the latter seems janky, error prone, and likely to impact things like query plans. Can someone help shed light on this please?

Theresita answered 8/8, 2014 at 20:37 Comment(4)

"and there's a vociferous debate about the performance of views" Among experts, there is no such debate. Views are inlined into the query plan. They contribute zero to query performance. Beginners often find a performance problem and randomly decide that it must be because there's a view. – Hollington 8/8, 2014 at 20:40

What exactly are you trying to solve? Why do you think you need a view at all? – Pape 8/8, 2014 at 21:1

@Andrew, the first line of the question states the goal: "I'm looking to create denormalized access to data essentially for reporting purposes (and thus to avoid joins and gain performance)." Rather than JOINing a bunch of tables and doing calculations on the fly, he wants to persist the end results to improve performance. An indexed view is one form of persistence. – Issus 8/8, 2014 at 21:19

So have you tried the queries with the joins? Database are optimzed to use joins and denormalization can in fact decrease performance. – Colter 8/8, 2014 at 21:47

Views are inlined into the query plan at a very early stage in the optimization pipeline. Neither do they hurt nor do they improve performance.

Indexed views are also inlined. It doesn't matter whether you have written your query to reference a view or whether you have pasted the view definition. On Enterprise Edition, the optimizer tries to match parts of the query against indexed views later in the pipeline. This is indeed unreliable. For simple cases it works fine but I have seen it fail randomly. There is no guarantee.

There is a solution for that: WITH (NOINDEX) forces the optimizer to not inline the view but to use its index. This is 100% reliable.

If you can fit your query pattern into an indexed view and are able to use that hint (you can create a wrapper view that always contains that hint) then indexed views are a nice automatic and consistent solution.

The former seems risky/magical to me

With that hint, there is little surprise left in indexed views. If it helps: I have production experience with indexed views. They work fine.

the latter seems janky, error prone, and likely to impact things like query plans. Can someone help shed light on this please?

That is true. Every bug has the potential to cause data corruption. You better not forget any place where data is written, or else your denormalized data becomes inconsistent.

I recommend you use indexed views for these reasons if there is no good reason against using them.

Hollington answered 8/8, 2014 at 20:45 Comment(6)

I'd appreciate a comment regarding the downvote. There are many myth surrounding views. Maybe the downvote was due to one such myth?! – Hollington 8/8, 2014 at 20:53

-1 Answer is tangential to the question. OP isn't asking about myths of views, he's asking about pros/cons of using non-inlined indexed views vs using denormalization tables. – Fibrilliform 8/8, 2014 at 21:4

@Fibrilliform I get your point. I have edited the answer. Before I recommend something I must dispel those myth, though. – Hollington 8/8, 2014 at 21:8

I have seen views harm performance but only when someone is stupid enough to use views to call views to call other views etc. Joining to the same table 7 times instead of once can indeed cause a performance problem. – Colter 8/8, 2014 at 21:46

@Colter that is not a problem due to using a view. Had that person written the same query inline he'd have had the same problem. Views might cause confusion but they technically never cause performance problems. – Hollington 8/8, 2014 at 22:8

Ther person would likely not have wrtten the same query if he had not been using views that called views. Once you can see exactly how many time you are joining to something, you are likely to do something about it. – Colter 11/8, 2014 at 13:0

Firstly, indexed views have two gotchas:

1) An updated row in the base table must be able to propagate through to the indexed view without having to reference any other rows in the base table. This is why you can use SUM() and COUNT_BIG(), since those aggregates can be updated just by knowing what is in the changed rows, but you can't use MIN() or MAX(), since you've have to look back through the base table to make sure.

All the seemingly arbitrary restrictions on indexed views (no TVFs, no subqueries, no unions, etc) boil down the the above reason. The engine has to be able to maintain the index without constantly looking at the whole base table.

2) AFTER triggers on the base table still get processed before the indexed views are updated.

These limit the complexity of what you can accomplish with indexed views alone.

Secondly, test that denormalization will actually help. If you are just instantiating simple joins and you are already I/O bound, that will make the bottleneck worse. On the other hand, if you are precomputing large aggregates or taking vertical slices of extremely wide joins, it's more likely to be an improvement.

Thirdly, to use indexed views, use a pattern like this:

CREATE TABLE huge_data_table ( ... )
GO

CREATE VIEW huge_data_table_monthly_summary_index AS
SELECT
  YEAR(...) AS [year_...]
 ,MONTH(...) AS [month_...]
 ,SUM(...) AS [sum_...]
 ,COUNT_BIG(*) AS [count_...]
FROM huge_data_table
GROUP BY YEAR(...),MONTH(...)
GO

CREATE UNIQUE CLUSTERED INDEX UC__xyz
ON huge_data_table_monthly_summary_index
  ([year_...],[month_...])
GO

CREATE VIEW monthly_summary AS
SELECT
  [year_...]
 ,[month_...]
 ,[sum_...]
 ,[count_...]
FROM huge_data_table_monthly_summary_index WITH (NOEXPAND) --force use of the view's index
GO

Then you can query monthly_summary and get the aggregates without having to recompute them every time. The engine updates huge_data_table_monthly_summary_index automatically whenever huge_data_table changes.

It may seem like magic, but it works. Use it as far as you can in lieu of triggers/jobs/etc to synchronize tables. I have found that I can usually break down a complex reporting job into smaller, simpler pieces that can use indexed views, and joining the intermediate results on the fly at reporting time turns out to be fast enough to get the job done.

If that's not enough for your needs, you are probably in the realm of log shipping or mirroring to a separate reporting server. There's not much middle ground where syncing manually makes sense.

Fibrilliform answered 8/8, 2014 at 21:58 Comment(0)

Recommended topics

Hot tags