I'm working on an application that accepts 100k+ unique inputs - for simplicity lets assume each input is a floating point value(a,b,c...etc) - though they could also be strings etc. The application has many rules/events/triggers that are related to these inputs.
Example1:
Rule[(a > b) and (c <= d)] --> [execute EventX]
Definition1: The above rule says: when the value of input 'a' is greater than that of 'b' and the value of input 'c' is less than or equal to that of 'd' execute EventX
Example2:
Rule[x != x.prev] --> [execute EventZ]
Definition2: The above rule says: If after a value update, if the current value of input 'x' is not equal to its previous value (the value has changed). execute EventZ
What I'm finding is that as the number of inputs increases, and as the number of rules increase, the total amount of computation required to evaluate the 'specific' rules and determine if an event should be triggered is getting out of control - Throwing faster and more h/w at the problem isn't scaling as expected.
At the moment, upon every new input update, the associated input/variable is looked-up in a hash-table that maps the variable to rules that contain it. Each of those rules are then subsequently evaluated, and if the result is true or actionable, the relevant event is triggered asynchronously.
This problem is in the realm of Complex Event Processing, unfortunately most of the architectures in this realm use the same deadhead approach described above - perhaps with a few refinements relating to the frequency with which things are evaluated/reevaluated. The idea is to have a system that can react in near real-time. Distributing the rules and inputs over multiple nodes doesn't seem to work well either, as in some situations a minority of inputs exists in over 95% of the active rules.
I was hoping if there are any fellow SO'ers out there, that know of better methods to solve this problem, data/structures or algorithms.
A simple idea I have in mind is that one could construct a list of dependent logical inferences.
If there are two rules that are like so:
Rule[a < b] --> [exec E1]
Rule[b >= a] --> [exec E2]
Then an update on either 'a' or 'b' should not require the evaluation of both etc. but I find that building such logical inference structures to be highly complex and error prone, and difficult to completely and rigorously test.
Inputs could be representative of things such as stock prices, temperature sensors etc.
One further note, some of the inputs are temporally constrained, meaning that the rule may require the state of a variable to be in a specific position/state for some amount of time (eg: the price of MSFT > $20 for last 30sec), currently this is accomplished by using a 'representing variable' (facade) that has a value of either 0 or 1 /false or true (the value of the variable is determined by a separate mechanism, that is usually the result of a rule being triggered).
Also it should be noted that due to the near real-time constraints and the amount of data being produced per second, options using a DB with triggers and stored procedures are out of the question.