joining across multiple fact tables with a dimension in between
Asked Answered
W

2

10

What's a good approach to data warehouse design if requested reports require summarized information about the same dimensions (and at the same granularity) but the underlying data is stored in separate fact tables?

For example, a report showing total salary paid and total expenses reported for each employee for each year, when salary and expenses are recorded in different fact tables. Or a report listing total sales per month and inventory received per month for each SKU sold by a company, when sales comes from one fact table and receiving comes from another.

Solving this problem naively seems pretty easy: simply query and aggregate both fact tables in parallel, then stitch together the aggregated results either in the data warehouse or in the client app.

But I'm also interested in other ways to think about this problem. How have others solved it? I'm wondering both about data-warehouse schema and design, as well as making that design friendly for client tools to build reports like the examples above.

Also, does this "dimension sandwich" use-case have a name in canonical data-warehousing terminology? If yes that will make it easier to research via Google.

We're working with SQL Server, but the questions I have at this point are hopefully platform-neutral.

Windham answered 13/1, 2014 at 6:10 Comment(4)
You can also consider a consolidated fact table.Prostyle
Consolidated fact tables are also a mean to introduce useful accessory or derived data. For example there are many way to calculate inventory turn using inventory and sales data, it is often (not always) advisable to prepare the column to be used in ETL phase, driving business user toward using the same calculation.Fca
Consolidated fact tables work OK for some use-cases, but many of the our DW's business use-cases have both simple dimensions (unlike the inventory case above where calculation by experts is important) and are required to support ad-hoc filtering and aggregation before the results are defined.Windham
But, through the useful Kimball link that @MarekGrzenkowicz supplied, I was able to find Drilling Across which is exactly the problem I'm trying to solve.Windham
W
10

I learned today that this technique is called Drilling Across:

Drilling across simply means making separate queries against two or more fact tables where the row headers of each query consist of identical conformed attributes. The answer sets from the two queries are aligned by performing a sort-merge operation on the common dimension attribute row headers. BI tool vendors refer to this functionality by various names, including stitch and multipass query.

Sounds like the naive solution above (query multiple fact tables in parallel and stitch together the results) is also the suggested solution.

More info:

Many thanks to @MarekGrzenkowicz for pointing me in the right direction to find my own answer! I'm answering it here in case someone else is looking for the same thing.

Windham answered 13/1, 2014 at 22:23 Comment(1)
The second link is outdated but I found another blog mentioning the three different techniques: blog.chrisadamson.com/2011/12/three-ways-to-drill-across.htmlMarkhor
S
4

The "naive solution" you described is most of the times the preferred one.

A common exception is when you need to filter the detailed rows of one fact using another fact table. For example, "show me the capital-tieup (stock inventory) for the articles we have not sold this year". You cannot simply sum up the capital-tieup in one query. In this case a consolidated fact can be a solution, if you are able to express both measures on a common grain.

Symptom answered 17/1, 2014 at 16:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.