Current situation in SQL Server database
There is a table Entry with the following columns:
- EntryID (int)
- EntryName (nvarchar)
- EntrySize (int)
- EntryDate (datetime)
Further there should be the possibility to save additional metadata for an Entry. Names and values of these metadata should be free to choose and there should be the possibility to dynamically add those without changing the table structure of the database. Each metadata key can be one of the following data types:
- Text
- Numeric value
- DateTime
- Boolean value (True/False)
Thus there is a table DataKey to represent the metadata names and datatypes with the following columns:
- DataKeyID (int)
- DataKeyName (nvarchar)
- DataKeyType (smallint) 0: Text; 1: Numeric; 2: DateTime; 3: Bit
In table DataValue for each combination of Entry and DataKey values can be inserted depending on the data type of the metadata key. For each data type there is one nullable value column. This table has the following columns:
- DataValueID (int)
- EntryID (int) Foreign-Key
- DataKeyID (int) Foreign-Key
- TextValue (nvarchar) Nullable
- NumericValue (float) Nullable
- DateValue (datetime) Nullable
- BoolValue (bit) Nullable
Image of the database structure:
TARGET
Target is to retrieve a list of entries fulfilling the specifications like in a WHERE clause. Like the following example:
Assumption:
- Meta data key KeyName1 is text
- Meta data key KeyName2 is DateTime
- Meta data key KeyName3 is numeric
- Meta data key KeyName4 is Boolean
Query:
... WHERE (KeyName1 = „Test12345“ AND KeyName2 BETWEEN ’01.09.2012 00:00:00’ AND
’01.04.2013 23:59:00’) OR (KeyName3 > 15.3 AND KeyName4 = True)
Target is to do these queries in a very efficient way, also with a large amount of data like
- Number of entries > 2.000.000
- Number of data keys between 50 und 100 or maybe > 100
- Per entry at least a subset of values specified or maybe also a value for each key (2.000.000 * 100)
PROBLEM
The first problem arises when building the query. Normally queries require to have sets with columns that can be used in the WHERE clause. In this case the columns used in the queries are entries in table DataKey as well to be able to dynamically add metadata without having to change the database table structure. During research a solution has been found using PIVOT table techniques at runtime. But it turned out that this solution is very slow when there is a large set of data in the database.
QUESTIONS
- Is there a more efficient way or structure to save the data for this purpose?
- How can the requirements listed above be fulfilled, also with regard to performance and time consumption when querying?
Here is a sql fiddle with the discribed database structure and some sample data: http://www.sqlfiddle.com/#!3/d1912/3
sparse columns
(see technet.microsoft.com/en-us/library/cc280604.aspx) for the additional properties in the main table would be an option? You would then not need any joins in your queries, and could have many, many property columns. – Horsetail