Is a star schema a denormalized schema?

Asked 25/8, 2013 at 21:40 Answered 26/10, 2018 at 20:36

An OLAP database consists of data in denormalized form. This means data redundancy and this data redundancy helps retrieve data through less number of joins, hence facilitating faster retrieval.

But a popular design for OLAP database is fact-dimension model. Fact table will store numerical fact-based entries (# of Sales, etc.) while dimension tables will store "descriptive attributes" related to the fact, i.e. details of the customer to which the sale was made.

My question is, in this design, it does not seem denormalized at all, as all dimension tables will have foreign key references to the fact table. How is it different from an OLTP design?

Nemertean answered 25/8, 2013 at 21:40 Comment(0)

The denormalization is in the dimension tables in a star schema: E. g. in a product table, you explicitly have many columns like several levels of product category in this one table, instead of having one table for each level, and using foreign keys referencing those values.

This means you have normalization with regard to facts, but stop normalizing on the dimension tables.

Furthermore, you often do not even completely normalize the facts. A typical example would be this: in a completely normalized table, you would use only two columns 'number of units sold' and 'price per unit', but in an OLAP database, it may make sense to redundantly have another column for the 'sales value' which could easily be calculated by multiplying units sold and the price per unit.

Glasses answered 26/8, 2013 at 11:46 Comment(5)

@FrankPI if I have a normalized hierarchy like Survey, Question, SubQuestion, AnswerChoices - would you say 1 dimension table w/columns: SurveyID, QuestionID, SubQuestionID, AnswerChoiceID, ...[attributes of surveys, questions, subquestions and AnswerChoices? This would be as opposed to tables DimSurvey, DimQuestion, DimSubQuestion, etc...? – Alpert 30/1, 2014 at 21:46

@jmsmcfrlnd This depends on how you would want to query it, i. e. what would be the queries run? Possibly this also depends on the tool that you want to use and its query capabilities. – Glasses 30/1, 2014 at 21:49

@FrankPI The tool to consume this data is Cognos, using the Framework to build logical data models for the querying. we will be querying/analyzing the answers to the survey questions (in Fact table) - but analyzing the surveys themselves (e.g. which questions perform "better" over another, etc. – Alpert 30/1, 2014 at 21:52

@FrankPI - correction to above comment: we will NOT be analyzing the surveys themselves. (couldn't edit my comment for some reason) – Alpert 30/1, 2014 at 21:58

@jmsmcfrlnd Maybe a conversation via comments is not a good way to discuss this. What about putting your problem into a question by itself? It would also be good to include the following info: What is it you are analyzing? Different types of answers for the same question? What would be the measures? – Glasses 30/1, 2014 at 22:21

You can get the difference if you study first "highly normalized schemas".

Will give you an example: Consider a "city" inside a "country" for a "person", all what you need to store for a person is only his "city" because anyway that city resides in a "country". so you don't have also to store the "country" in the "person" table. This approach will have advantage of "minimal" storage. But as disadvantage it will be annoying to retrieve "country" for a "person" since you will have to do many joins to achieve that.

So regarding your question, in your design, if we stored both "city_id" and "country_code" in "person" table, this will cause little redundancy but as advantage it will be more easier to get "person" "country" by directly joining the two tables "Countries" and "person" together.

Normalization main purpose is to remove redundancy. And to achieve data consistency. For example, in your case OLAP , developer can make mistake by inserting correct "city_id" and wrong "country_id" for example he can insert "Paris" as city and by mistake he can insert "Germany" as the country which is wrong. If the schema is fully normalized, this cannot never happens since it will store only "Paris" "city id" in "party" table and will not store "country" id.

So yes, OLAP is de-normalized since it allows data redundancy and developers (application) mistakes (if any).

Sanferd answered 26/10, 2018 at 20:36 Comment(0)

Recommended topics

Hot tags