Database schema-related problem

Asked 26/11, 2010 at 9:4 Answered 26/11, 2010 at 9:23

I have a kind of theoretical question about databases. To make it more concrete I have thought up an example case.

Suppose I have a store with products. I have lots of different products. Not every product has the same applicable properties. For instance, I could define the size of a harddisk in gigabytes, but cannot use that same property on a CPU, simply because it does not apply. What I want is a database where I can add properties to products dynamically. The only thing I can come up with is the following:

One product table with an ID, a Name and a Description.

One properties table with an ID, Product_ID, Property and a Value.

This way I would potentially get a gigantic, I-suppose-not-so-efficient, table of properties. This has been bugging me for a long time now. Does anyone know a better solution to my problem?

Magus answered 26/11, 2010 at 9:4 Comment(2)

If you have absolutely no idea what the set of possible properties looks like then your dynamic scheme does work. It is problematic as you need to keep the possible properties consistent (so you don't end up with multiple properties for the same thing, for example "memory_size", "gigabytes", "bytes", ...) The alternative, if you do know what the properties could be is to have a series of specialised tables, e.g. StorageProducts, ProcessingProducts, etc that hold the properties that apply to various classes of product. – Quintus 26/11, 2010 at 9:11

@John Pickup: why don't you put that nice answer into..an answer ? I would vote for it, it's a nice complement to PerformanceDBA's answer. – Ambitious 26/11, 2010 at 9:49

This is actually moving towards Sixth Normal Form, it is just that people like you who do not have the academic or experiential background do not know the (a) name for it and (b) the rules and the caveats. Such people have implemented what is commonly know as Entity-Attribute-Value or EAV. If it is done properly, it is fine, and there are many thousands of medical system out there carrying diagnostic and dosage info in such tables. If it is not, then it is one dog's breakfast to use and maintain.

First make sure you have Product in true and full 5NF.
Always use full Declarative Referential Integrity; CHECK constraints and RULES.
Never put all that into one table with a VARCHAR() for Value. Always use the correct (applicable) DataTypes. That means you will have several tables, one each per DataType, and there is no loss of control or integrity.
Likewise any Associative tables (where there is a multiple reference to another table [eg. Vendor] ) must be separate.
- I am providing a Data Model which has the full control discussed; it includes a simple catalogue which can be used for verification as well as navigation. You need to add every CHECK Constraint and RULE to ensure that the data and referential Integrity is not lost. That means, eg:
  - for the CPUSpeed column, which is stored in ProductDecimal, CHECK that it is in the proper range of values
  - for each sub-Product table CHECK that the DataType is correct for the ProductType-ColumnNo combination
- This structure is way better than most EAV, and not quite the full 6NF.
  .
Keep all the mandatory columns in Product; use the sub-Product tables for optional columns only.
For each such (eg Product) table, you need to create a View (dotted line), which will construct the 5NF rows from the EAV/6NF tables. You may have several Views: Product_CPU, Product_Disk.
Do not update via the View. Keep all your updates transactional, in a stored proc, and insert or update each of the columns (ie. the Product and sub-Product tables which are applicable, for each particular ProductType) together.
Gigantic ? Commercial databases (not the freeware) have no problems with large tables or joins. This is actually a very efficient structure, and allows very fast searches, because the tables are in fact column-oriented (not row-oriented). If the population is gigantic, then it is gigantic, do your own arithmetic.
You need one more table, a Lookup table for Property (or Attribute). That is part of the catalogue, and based on ProductType

The better solution is to go for full, formal Sixth Normal Form. Not necessary if you have only one or a few tables that require optional columns.

To be clear:

Sixth Normal Form is The Row consists of the Primary Key and, at most, one Attribute.
This is 6NF (for at least the Product table cluster), then Normalised again (Not in the Normal Form sense) by DataType, to reduce the no of tables (otherwise you would have one table per Attribute).
This retains full Rdb control (FKs, constraints, etc); whereas the common EAV types don't bother with the DRI and control.
This also has the rudiments of a catalogue.

Link to Product Cluster Data Model

Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.

Update

You might be interested in this ▶5NF 6NF Discussion◀. I will write it up at some point.

Deport answered 26/11, 2010 at 9:23 Comment(3)

I am stunned by your elaborate answer. Thanks a lot! I am noticing a huge gap in my knowledge as a web developer. As an IT-student I would like to know more about this subject. Do you know of any good literature for me to dive into? – Magus 26/11, 2010 at 12:8

@Bram. Thanks, and you're most welcome. After college I had good mentors, that's the fastest way. Rdb design is dfifferent to app design. Follow someone with actual experience. Buy the best textbooks. The web is full of misinformation and shallow information, many responders read that, and post from that. Make sure you make the distinction. Keep an open mind, but not so open that it fills with junk. Be rigid about Standards, they were created by minds greater than ours. If you go to the Directories in my links, you can look at other Answers. Ask good questions. – Deport 26/11, 2010 at 23:47

@Bram. For comparison and further understanding, please check this question/answer for a different 6NF rendition. – Deport 30/11, 2010 at 12:57

Initially I'd have suggested you have a productproperty table to model the relationship between products and properties. This would allow you to associate many products with a particular property.

However, I'm not keen on the idea of having a value stored alongside each property as a 1:1. It might be best if you have a propertyvalue table that associates a property with a value. Then, you'd ditch the productproperty table in favour of having a richer productpropertyvalue table which could fully describe the relationship between a product, it's properties and their values.

Perhaps then you could have the following:

product => (ID (unique key), Name, Description)
property => (ID (unique key), Description)
propertyvalue => (ID (unique key), propertyID (foreign key), value)
productpropertyvalue => (ID (unique key), productID (foreign key), propertyValueID (foreign key))

Of course, property values could be complex rather than simple strings or integers, but hopefully this takes you in the right direction.

Frater answered 26/11, 2010 at 9:18 Comment(0)

Update

Recommended topics

Hot tags