Do you know a way to organize boolean expressions in a database while allowing infinite nesting of the expressions?
Example:
a = 1 AND (b = 1 OR b = 2)
The expression as a whole shouldn't be stored as varchar to preserve data integrity.
Do you know a way to organize boolean expressions in a database while allowing infinite nesting of the expressions?
Example:
a = 1 AND (b = 1 OR b = 2)
The expression as a whole shouldn't be stored as varchar to preserve data integrity.
Option 1 would be to use a nested table (a tree with id / parent_id structure), like Gamecat suggested. This is relatively expensive to do, and requires issuing SQL queries repetitively to build the equivalent of a single nested expression.
Option 2 would be to use a serialized object and store it into a varchar column. For example, JSON would be a good choice. It is not white-space sensitive, can be created and parsed in a vast number of languages, and it retains data integrity.
As soon as you have parsed your expression string into a tree object in memory, you can serialize it and store it. If there was no need to manipulate the expression on the database level, I guess I would go that route.
An expression is a treelike structure. So you need a way to present the tree in a table.
You can for example use the fields:
In this case, you have the following types:
But I think there are better ways to organise expression. I once made a simple expression evaluator that accepts a string and produces a numeric result.
I would store the expression in polish form, in a varchar/text column. An expression in polish form (operand before operands, no brackets) is much easier to parse with a recursive function (or a stack of course).
a = 1 AND (b = 1 OR b = 2)
in polish form shows like this:
AND = a 1 OR = b 1 = b 2
This type of expression is most usually expressed as a tree (a hierarchy), which are notoriously annoying to query in SQL.
We'll assume that a
and b
are numeric for the moment and that literals ('1', '2') are distinguished from variables.
Table Nodes
id
type (Variable|Literal)
name (nullable for literal)
value
Table Operators
id
name (=, AND, OR, NOT)
leftNodeId
rightNodeId
This structure is very flexible, but querying it to retrieve a complex expression is going to be "fun" (read that "challenging").
And you still have to parse the structure to begin with and evaluate the expression after it has been reconstructed.
This is going to be difficult to represent relationally, because by its nature it is both hierarchical and polymorphic (the leaves of your tree can be either variables or constants).
The traditional way to model Boolean functions is to use Binary Decision Diagrams, especially Reduced Order Binary Decision Diagrams. It's possible you can find an extension for your DBMS that provides good support for the concept.
UPDATE:
Alternately, if you don't need to query on the Boolean logic, you could use a BDD library and just serialize the BDD into a BLOB
or equivalent. It beats using a varchar
field because the BDD library would ensure the data was valid.
Ok withe the answers but what if there are more than 2 expressions in an expression group ?
a = 1 AND (b = 1 OR b = 2 OR b = 3)
I propose this :
-------------------
| Condition |
-------------------
| - id |
| - value1 |
| - value2 |
| - operation |
-------------------
|(1)
| --------------
| | |
| |(*) |
------------------- |
| ConditionGroup | |
------------------- |
| - id |--------
| - groupType |
| - condition |
| - subConditionGroups
-------------------
value1
, value2
and operation
model a final comparison like 'b = 3
' (in my case value1 = 'b'
, value2 = '3'
and operation = 'EQUALS'
)groupeType
can be 'AND
' or 'OR
'ConditionGroup
can have either a sub-list of ConditionGroup
s OR a final Condition
but not both.ConditionGroup
, recursively dig into its subConditionGroups
until you find a final condition, then return the value and apply the proper condition
.Actually that is what I'm about to try.
Adding to @Gamechat answer
I think it should be like this
ID
TypeExpression (and, or etc...)
FirstChildID --This can be a leaf node or a pointer to another row in same table
SecondChildID --This can be a leaf node or a pointer to another row in same table
isFirstChildLeaf
isSecondChildLeaf
© 2022 - 2024 — McMap. All rights reserved.