Microsoft Azure Data Warehouse: Flat Tables or Star Schema
Asked Answered
A

1

0

I am creating data warehouse model on numerous OLTP tables. a) I can either utilize a Star schema or b) Flat table model table.

Many people think dimensional star schema model table is not required; because most data can report itself in a single table. Additionally, star schema Kimball was created when performance and storage are an issue. Some claim with improved tech, data can be presented in a single table.

Should I still separate data into dimensions/facts tables or just use the flat tables directly in data warehouse?

In Microsoft Azure, are flat wide tables recommended or star schema?

In this question, I believe AWS Redshift employee preferred flat wide table. Performance of Flat Tables Vs Dimension and Facts

Acetophenetidin answered 2/1, 2019 at 5:38 Comment(0)
N
0

I think this question is best answered with, "It Depends on your business needs, your time and resources." I think there are reasons to support both, depending on your circumstance. However in my experience I would go with a Star Schema if you are building these tables to be consumed by a large number of reports and other analyses.

I am guessing the tables you are brining in are still in 3rd normal form? In both cases you are still de-normalizing, however assuming this is something you are creating for the long run, I think Star would better serve your purposes. Kimball didn't just suggest the dimension/fact relationship because of technical optimization reasons, there are also business reasons for it as well.

  1. Example: you have a products table that you build once, and you have a sales fact that you connect it to. In the next 6 months perhaps now someone wants all business metrics relating to inventory, or discounts, most likely both. You already have a product table that would fit with that. If you have a single flattened out table on sales with products included, you end up doing the same work again for inventory and discounts on products. When products are separated out, its a easier to just apply that one join to each of those three fact tables, and more that will surely come up in the future. Time spent on the Star is less in the long run as you are able to iterate with new measurable numbers to add.

  2. Maintaining that product table, or any dimension table (context for your measurable amounts) is much easier when you have on table to work with. Any time a new column to better categorize products for example

  3. Any modeling tool is easy to work with most of the time when you have a star schema to work with (SSAS and PowerPivot for example), same with drag and drop reports (like pivot tables connected to your models)

Numeration answered 2/1, 2019 at 20:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.