Data Model for Boolean Expressions
Asked Answered
R

8

18

Do you know a way to organize boolean expressions in a database while allowing infinite nesting of the expressions?

Example:

a = 1 AND (b = 1 OR b = 2)

The expression as a whole shouldn't be stored as varchar to preserve data integrity.

Resolved answered 4/11, 2008 at 12:28 Comment(5)
Please clarify: You want to store the result of the expression or to be able to reconstruct the expression from native DB column types?Streptokinase
I like to reconstruct the expression.Resolved
Is there a requirement that the database be SQL/relational? Can you use an OODBMS?Dematerialize
Nope, database must be relational.Resolved
See also cs.stackexchange.com/questions/104311/…Ligule
U
15

Option 1 would be to use a nested table (a tree with id / parent_id structure), like Gamecat suggested. This is relatively expensive to do, and requires issuing SQL queries repetitively to build the equivalent of a single nested expression.

Option 2 would be to use a serialized object and store it into a varchar column. For example, JSON would be a good choice. It is not white-space sensitive, can be created and parsed in a vast number of languages, and it retains data integrity.

As soon as you have parsed your expression string into a tree object in memory, you can serialize it and store it. If there was no need to manipulate the expression on the database level, I guess I would go that route.

Unassuming answered 4/11, 2008 at 12:39 Comment(1)
You are making a very good point here: one should not be using option 1 just because it can be used; there must be certain requirements in place to justify option 1. If such requirements are not imminent, I'd probably start from option 2.Pachston
T
9

An expression is a treelike structure. So you need a way to present the tree in a table.

You can for example use the fields:

  • ID
  • TypeExpression (and, or etc...)
  • FirstChildID
  • SecondChildID

In this case, you have the following types:

  1. AND, Children point to other expression.
  2. OR, Children point to other expression.
  3. Equal, Children point to other expression.
  4. Literal, FirstChild points to an entry in a literal table.
  5. VariableLookup, FirstChild points to an entry in a varable table.

But I think there are better ways to organise expression. I once made a simple expression evaluator that accepts a string and produces a numeric result.

Tramroad answered 4/11, 2008 at 12:36 Comment(0)
D
4

I would store the expression in polish form, in a varchar/text column. An expression in polish form (operand before operands, no brackets) is much easier to parse with a recursive function (or a stack of course).

a = 1 AND (b = 1 OR b = 2)

in polish form shows like this:

AND = a 1 OR = b 1 = b 2

Dorthydortmund answered 30/4, 2010 at 12:18 Comment(0)
S
3

This type of expression is most usually expressed as a tree (a hierarchy), which are notoriously annoying to query in SQL.

We'll assume that a and b are numeric for the moment and that literals ('1', '2') are distinguished from variables.

Table Nodes
id
type (Variable|Literal)
name (nullable for literal)
value

Table Operators
id
name (=, AND, OR, NOT)
leftNodeId
rightNodeId

This structure is very flexible, but querying it to retrieve a complex expression is going to be "fun" (read that "challenging").

And you still have to parse the structure to begin with and evaluate the expression after it has been reconstructed.

Streptokinase answered 4/11, 2008 at 12:43 Comment(0)
L
2

This is going to be difficult to represent relationally, because by its nature it is both hierarchical and polymorphic (the leaves of your tree can be either variables or constants).

Lucerne answered 4/11, 2008 at 12:41 Comment(0)
M
2

The traditional way to model Boolean functions is to use Binary Decision Diagrams, especially Reduced Order Binary Decision Diagrams. It's possible you can find an extension for your DBMS that provides good support for the concept.

UPDATE: Alternately, if you don't need to query on the Boolean logic, you could use a BDD library and just serialize the BDD into a BLOB or equivalent. It beats using a varchar field because the BDD library would ensure the data was valid.

Malonylurea answered 4/11, 2008 at 14:12 Comment(0)
B
1

Ok withe the answers but what if there are more than 2 expressions in an expression group ?

a = 1 AND (b = 1 OR b = 2 OR b = 3)

I propose this :

-------------------
|    Condition    |
-------------------
| - id            |
| - value1        |
| - value2        |
| - operation     |
-------------------
    |(1)
    |        --------------
    |       |              |
    |       |(*)           |
-------------------        |
| ConditionGroup  |        |
-------------------        |
| - id            |--------
| - groupType     |
| - condition     |
| - subConditionGroups
-------------------
  • value1, value2 and operation model a final comparison like 'b = 3' (in my case value1 = 'b', value2 = '3' and operation = 'EQUALS')
  • groupeType can be 'AND' or 'OR'
  • a ConditionGroup can have either a sub-list of ConditionGroups OR a final Condition but not both.
  • Now your start expression is a ConditionGroup, recursively dig into its subConditionGroups until you find a final condition, then return the value and apply the proper condition.

Actually that is what I'm about to try.

Boron answered 13/11, 2019 at 11:3 Comment(0)
H
0

Adding to @Gamechat answer

I think it should be like this

ID

TypeExpression (and, or etc...)

FirstChildID --This can be a leaf node or a pointer to another row in same table

SecondChildID --This can be a leaf node or a pointer to another row in same table

isFirstChildLeaf

isSecondChildLeaf

Hardiness answered 5/11, 2008 at 9:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.