SQL Database storing different types of values (in or simulated as a single field)
Asked Answered
C

2

5

In a database I want to be able to assign values of a varying type to variables in a variable table. So do I need a separate value table for each value type? If so, I am not sure how you would actually link Values to the right table and thus right value. How can I achieve what I am after?

Variables
    ID
    Name

VariableValuesLink
    ID
    IDVars
    IDVals

Values
    IDvals

ValuesValueLink
    ID
    IDvals
    IDval

ValuesInt
    IDval
    IntVal

ValuesFloat
    IDval
    FloatVal

ValuesDouble
    IDval
    DoubleVal

etc...
etc...
etc...
etc...

The aim is for me to get something like this:

Variable: 
    ezas123
Values:
    1 (Int)
    2.0 (Float)
    3.0 (Double)

Variable:
    QuickFox
Values:
    The (TinyText)
    Quick (TinyText)
    Brown (TinyText)
    Fox (TinyText)
    Jumped (TinyText)
    Over (TinyText)
    The (TinyText)
    Lazy (TinyText)
    Dog (TinyText)

Variable:
    Pangrams
Values:
    The Quick Brown Fox Jumped Over The Lazy Dog (Text)
    How quickly daft jumping zebras vex (Text)

So when I query the DB I would be able to get back this set of results (where the values are of a varying type)

Variable    Value
ezas123     1
ezas123     2.0
ezas123     3.0
QuickFox    The
QuickFox    Quick
QuickFox    Brown
QuickFox    Fox
QuickFox    Jumped
QuickFox    Over
QuickFox    The
QuickFox    Lazy
QuickFox    Dog
Pangrams    The Quick Brown Fox Jumped Over The Lazy Dog
Pangrams    How quickly daft jumping zebras vex
Curcuma answered 28/4, 2011 at 11:52 Comment(3)
I guess if worst comes to worse I can simply declare the value as a datatype VARCHAR(255) and have another column to declare the type of variable, so when I deal with it later I can convert it to whatever it needs to be.Curcuma
I've done it that way too. They both have their advantages -- I think varchar(max) or nvarchar(max) is probably easier most of the time. If you have a lot of user input it can get hairy -- as there will be datatype conversion issues after you have data in the dictionary (as op to before you put it in).Canopy
Which DBMS are you using?Alate
H
4

A couple of points:

  • In your example, the variable ezas123 has three values with different data types, meaning that the variable itself doesn't actually have a defined data type. This will probably cause problems downstream and is likely to indicate that the data is pretty poorly defined. I'd look at including a restriction that all of the values for a given variable must have the same data type.

  • Hogan's SQL query makes the point that whenever you list the values in the way you requested (i.e. across variables with different data types) you'll be having to cast the result to varchar or similar to display it (since you can't have values with different data types in the same output column). With that in mind, do you really need different data types, or would a varchar type work well for all the data you're dealing with?

If the different types are needed, I'd look at putting all of the different IntVal, FloatVal, DoubleVal, ... columns into one table. Your table definitions could then look something like:

Variables
      ID          NOT NULL
     ,Name        NOT NULL
     ,DataType    NOT NULL CHECK (DataType IN ('INT','FLOAT','DOUBLE','TEXT'))  
   ,CONSTRAINT PK_Variables PRIMARY KEY (ID)
   ,CONSTRAINT UQ_Variables_1 UNIQUE (Name)
   ,CONSTRAINT UQ_Variables_2 UNIQUE  (ID,DataType)

    Values
      IDvals      NOT NULL
     ,ID          NOT NULL
     ,DataType    NOT NULL CHECK (DataType IN ('INT','FLOAT','DOUBLE','TEXT'))
     ,IntVal      NULL
     ,FloatVal    NULL
     ,DoubleVal   NULL
     ,TextVal     NULL
   ,CONSTRAINT PK_Values PRIMARY KEY (IDvals)
   ,CONSTRAINT FK_Values_Variable FOREIGN KEY (ID,DataType) REFERENCES Variables(ID,DataType)
   ,CONSTRAINT CH_Values CHECK ( NOT(DataType <> 'INT'    AND IntVal     IS NOT NULL)  AND
                                 NOT(DataType <> 'FLOAT'  AND FloatVal   IS NOT NULL)  AND
                                 NOT(DataType <> 'DOUBLE' AND DoubleVal  IS NOT NULL)  AND
                                 NOT(DataType <> 'TEXT'   AND TextVal    IS NOT NULL)
                                )
  • The UNIQUE constraint on Variables(ID,DataType) will probably be required (DBMS?) to allow you to make it the subject of a FK;
  • The CHECK constraints ensure that only valid data types are being used and that the only the correct value columns can be populated;
  • Having DataType in Values as well as Variables means that a combination of FK and CHECK can be used to ensure that all values for a given variable have the same data type, rather than having to use triggers or application logic.

A query against the tables then looks something like:

SELECT v.name as Variable,
       COALESCE(cast(a.IntVal       as varchar(max)),
                cast(a.FloatVal     as varchar(max)),
                cast(a.DoubleVal    as varchar(max)),
                cast(a.TextVal      as varchar(max)),
                '') as Value
FROM 
Variables V
JOIN Values a on V.ID = a.ID AND v.DataType = a.DataType

This could also be written (probably more correctly) with a CASE based on Variable.DataType being used to choose the relevant column.

Having all of the values in one table means less tables/constraints/indexes in the database and means that extending the solution to hold new data types just means adding new columns to the Values table (and modifying the constraints) rather than adding new tables.

Hierarch answered 2/5, 2011 at 17:33 Comment(1)
Very good points. The real issue is how the DB is going to be used and updated. Depending on the use case big table with all types or many little tables of different types might be the better choice.Canopy
C
7

Quick point you can simplify your design in this way -- Just have each of the Values tables point back at the variables table. There is no need for the linking table. The only reason for a linking table I can think of is if you want an "easier" way to have sequence accross all variable types. If this is not needed then use this design below:

Variable
    ID
    Name

ValuesInt
    IDvariable
    IntVal

ValuesFloat
    IDvariable
    FloatVal

ValuesDouble
    IDvariable
    DoubleVal

etc...
etc...
etc...

How your sql is easy:

select v.name as Variable,
       coalesce(cast(vi.IntVal as varchar(max)),
                cast(vf.FoatVal as varchar(max)),
                cast(vd.DoubleVal as varchar(max)),
                '') as Value
From Variable V
JOIN ValuesInt vi on V.ID = vi.IDvariable
JOIN ValuesFloat vf on V.ID = vf.IDvariable
JOIN ValuesDouble vd on V.ID = vd.IDvariable
Canopy answered 28/4, 2011 at 12:2 Comment(0)
H
4

A couple of points:

  • In your example, the variable ezas123 has three values with different data types, meaning that the variable itself doesn't actually have a defined data type. This will probably cause problems downstream and is likely to indicate that the data is pretty poorly defined. I'd look at including a restriction that all of the values for a given variable must have the same data type.

  • Hogan's SQL query makes the point that whenever you list the values in the way you requested (i.e. across variables with different data types) you'll be having to cast the result to varchar or similar to display it (since you can't have values with different data types in the same output column). With that in mind, do you really need different data types, or would a varchar type work well for all the data you're dealing with?

If the different types are needed, I'd look at putting all of the different IntVal, FloatVal, DoubleVal, ... columns into one table. Your table definitions could then look something like:

Variables
      ID          NOT NULL
     ,Name        NOT NULL
     ,DataType    NOT NULL CHECK (DataType IN ('INT','FLOAT','DOUBLE','TEXT'))  
   ,CONSTRAINT PK_Variables PRIMARY KEY (ID)
   ,CONSTRAINT UQ_Variables_1 UNIQUE (Name)
   ,CONSTRAINT UQ_Variables_2 UNIQUE  (ID,DataType)

    Values
      IDvals      NOT NULL
     ,ID          NOT NULL
     ,DataType    NOT NULL CHECK (DataType IN ('INT','FLOAT','DOUBLE','TEXT'))
     ,IntVal      NULL
     ,FloatVal    NULL
     ,DoubleVal   NULL
     ,TextVal     NULL
   ,CONSTRAINT PK_Values PRIMARY KEY (IDvals)
   ,CONSTRAINT FK_Values_Variable FOREIGN KEY (ID,DataType) REFERENCES Variables(ID,DataType)
   ,CONSTRAINT CH_Values CHECK ( NOT(DataType <> 'INT'    AND IntVal     IS NOT NULL)  AND
                                 NOT(DataType <> 'FLOAT'  AND FloatVal   IS NOT NULL)  AND
                                 NOT(DataType <> 'DOUBLE' AND DoubleVal  IS NOT NULL)  AND
                                 NOT(DataType <> 'TEXT'   AND TextVal    IS NOT NULL)
                                )
  • The UNIQUE constraint on Variables(ID,DataType) will probably be required (DBMS?) to allow you to make it the subject of a FK;
  • The CHECK constraints ensure that only valid data types are being used and that the only the correct value columns can be populated;
  • Having DataType in Values as well as Variables means that a combination of FK and CHECK can be used to ensure that all values for a given variable have the same data type, rather than having to use triggers or application logic.

A query against the tables then looks something like:

SELECT v.name as Variable,
       COALESCE(cast(a.IntVal       as varchar(max)),
                cast(a.FloatVal     as varchar(max)),
                cast(a.DoubleVal    as varchar(max)),
                cast(a.TextVal      as varchar(max)),
                '') as Value
FROM 
Variables V
JOIN Values a on V.ID = a.ID AND v.DataType = a.DataType

This could also be written (probably more correctly) with a CASE based on Variable.DataType being used to choose the relevant column.

Having all of the values in one table means less tables/constraints/indexes in the database and means that extending the solution to hold new data types just means adding new columns to the Values table (and modifying the constraints) rather than adding new tables.

Hierarch answered 2/5, 2011 at 17:33 Comment(1)
Very good points. The real issue is how the DB is going to be used and updated. Depending on the use case big table with all types or many little tables of different types might be the better choice.Canopy

© 2022 - 2024 — McMap. All rights reserved.