Inference engine to calculate matching set according to internal rules
Asked Answered
O

6

7

I have a set of objects with attributes and a bunch of rules that, when applied to the set of objects, provides a subset of those objects. To make this easier to understand I'll provide a concrete example.

My objects are persons and each has three attributes: country of origin, gender and age group (all attributes are discrete). I have a bunch of rules, like "all males from the US", which correspond with subsets of this larger set of objects.

I'm looking for either an existing Java "inference engine" or something similar, which will be able to map from the rules to a subset of persons, or advice on how to go about creating my own. I have read up on rule engines, but that term seems to be exclusively used for expert systems that externalize the business rules, and usually doesn't include any advanced form of inferencing. Here are some examples of the more complex scenarios I have to deal with:

  1. I need the conjunction of rules. So when presented with both "include all males" and "exclude all US persons in the 10 - 20 age group," I'm only interested in the males outside of the US, and the males within the US that are outside the 10 - 20 age group.

  2. Rules may have different priorities (explicitly defined). So a rule saying "exclude all males" will override a rule saying "include all US males."

  3. Rules may be conflicting. So I could have both an "include all males" and an "exclude all males" in which case the priorities will have to settle the issue.

  4. Rules are symmetric. So "include all males" is equivalent to "exclude all females."

  5. Rules (or rather subsets) may have meta rules (explicitly defined) associated with them. These meta rules will have to be applied in any case that the original rule is applied, or if the subset is reached via inferencing. So if a meta rule of "exclude the US" is attached to the rule "include all males", and I provide the engine with the rule "exclude all females," it should be able to inference that the "exclude all females" subset is equivalent to the "include all males" subset and as such apply the "exclude the US" rule additionally.

I can in all likelihood live without item 5, but I do need all the other properties mentioned. Both my rules and objects are stored in a database and may be updated at any stage, so I'd need to instantiate the 'inference engine' when needed and destroy it afterward.

Oulu answered 24/5, 2010 at 13:56 Comment(8)
Is the data in a RDBMS ? why not attempt this at the data level? It sounds more like you need a reporting tool to me.Disinterested
The data is in a RDBMS, yes, but I'm not sure that a reporting tool would suffice. I have very little experience, but it seems that the complexity combined with this being used real-time rather than for reports would make it infeasible?Oulu
I think you can do some more work on the Business Analysis side. From your description, I see poor translation on business requirements to technical terms - example: "So a rule saying "exclude all males" will override a rule saying "include all US males." -- this can be construed as a single rule "exclude all males outside of US" and is within the powers of any run off the mill rule engine. I'd suggest you read a bit on NLP and formal logic. One final thing - avoid priorities and like a plague, this is the easiest way to end up with unmaintainable mess. Ordering is also bad, but not as bad.Foregather
Obviously it's hard to relay the business requirements in a way that is easily understandable on StackOverflow, but I believe we have a good understanding of what is required. Sure, any rule engine can handle a rule "exclude all males outside of US," but which rule engines can combine "exclude all males" and "include all US males?" NLP isn't relevant as these rules are not captured in a natural language, and I have a good understanding of formal logic, which can definitely solve this, but I do not know of any java implementation which is expressive enough to inference across these rules.Oulu
Is it me or you are looking for a java implementation of a PROLOG-like inference engine?Bride
Yes, to a degree. I had a look at some of the bridging libraries, but none of them seem particularly reliable/performant/complete.Oulu
I remember some time ago a demo from Erudine some time ago, who use Ripple Down Rules to infer knowledge from examples. You start with "exclude all males", find the sample that does not match and add "except the ones living in US", etc. erudine.com en.wikipedia.org/wiki/Ripple_down_rulesForegather
@Zacrates: Still looking for an answer? SO has rather let you down...Decima
D
3

There are a bunch of embedded Prolog-like SLD solvers for Java; my favourite approach is to use mini-Kanren for Scala, since that is clean and allows you to use Scala to lazily handle the results of queries, but I have not used it in any depth. See Embedded Prolog Interpreter/Compiler for Java for other options, as well as Ross' answer.

SLD solvers handle all of your criteria, provided they have some extra features that Prolog has:

  1. Conjunction of rules: Basic SLD goal processing;
  2. Rules may have different priorities: Prolog's cut rule allows representation of negation, provided the queries are decidable;
  3. Rules may be conflicting: Again, with cut you can ensure that lower priority clauses are not applied if higher priority goals are satisfied. There are a few ways to go about doing this.
  4. Rules are symmetric: With cut, this is easily ensured for decidable predicates.
  5. Rules (or rather subsets) may have meta rules (explicitly defined) associated with them: your example seems to suggest this is equivalent to 4, so I'm not sure I get what you are after here.

The advantages and disadvantages of SLD solvers over description logic-based tools are:

  1. Programmatic power, flexibility: you can generally find programming solutions to modelling difficulties, where description logics might require you to rethink your models. But of course absence of duct-tape means that description logic solutions force you to be clean, which might be a good discipline.
  2. Robustness: SLD solvers are a very well understood technology, while description logic tools are often not many steps from their birth in a PhD thesis.
  3. Absence of semantic tools: description logic has nice links with first-order logic and model logic, and gives you a very rich set of techniques to reason about them. The flexibility of Prolog typically makes this very hard.

If you do not have special expertise in description logic, I'd recommend an SLD solver.

Decima answered 22/10, 2010 at 9:18 Comment(0)
E
2

For the case you're describing I think you'll want to use backwards-chaining, rather than forward chaining (RETE systems like Drools are forward-chaining, in their default behavior).

Check out tuProlog. Easy to bind with Java, 100% pure Java, and can definitely do the inferencing you want. You'll need to understand enough about Prolog to characterize your rule set.

Prova can also do inferencing and handle complex rule systems.

Ethereal answered 19/10, 2010 at 14:39 Comment(2)
+1, embedded SLD solver. tu-Prolog seems to be the accepted answer on #1817510Decima
Thanks, I'll have a look. I'm away for training currently, but I'll respond as soon as I have enough time to evaluate it properly.Oulu
G
0

This pretty much sounds like description logic and knowledge bases to me. You have concepts, roles and individuums.

If you want to roll out your problem as description logic-based reasoning, you should be fine modeling your problem and execute a reasoner on it.

There are some free reasoners availaibe, a list can be found here.

Note however, that this is rather a complex yet powerful approach.

You might want to have a special look at KAON2 and DIG when using Java.

Gebler answered 24/5, 2010 at 14:8 Comment(3)
I had a look at description logics and reasoners, but I have a few qualms. Firstly is the complexity of the solution. Secondly is the need for a fairly expressive description logic, which will affect reasoning time or maybe even make it impossible. Lastly is that the priority requirement isn't met out of the box by any description logic I know of, which would require me developing my own description logic variant. Care to comment?Oulu
Descriptive logics don't seem to do a very good job with "fuzzy". Your requirements state that there can be conflicting rules. An approach based on something like OWL is supposed to be "open-world", which means that it can represent the conflict. The question is, what does the reasoner (Pellet, etc) do with the conflict? Offhand I think you could add additional statements to the model that would disambiguate. Maybe someone who's done it knows the exact answer.Ethereal
@Ross - Item 2 in my question mentions priorities, which will be manually assigned and will be used to determine what to do with conflicts.Oulu
T
0

I believe that you could use sort of ID3 algorithm to extract a set of rules from the initial state of your objects. I don't know any concrete Java implementation, although Wikipedia points to different implementations from Ruby to C (I can't post more than one hyperlink :-)), but it's not a hard algorithm to learn.

Once it builds the decision tree, that can be expressed in rule format, you could use it to see to which class your objects belongs: to all males from the US, to all females between 10 and 20,... and when someone updates your objects in the database, you can rebuild the decision tree.

Tarango answered 23/6, 2010 at 10:26 Comment(1)
Problems I've noticed. This will only help with classification, not with the inverse (given a set of rules, find the matching population). This is based on machine learning and as such is not 100% accurate. The rules are of such a nature that it is possible for 100% accurate inferences to be made. There is no initial population, I'd have to manually create and classify them to serve as training data.Oulu
S
0

Ok, this may be a dumb answer. But i will try anyway.... You could use BeanShell to make things like this:

  • Create a simple selector command (something like select(inicialSet, paramName, paramValue) ) that return a set elements which is in inicialSet and match your params.

  • Create some constants that can help you write nice BeanShell scripts.

This way you can write your rules as simple scripts AND nest rules.

So, this

I need the conjunction of rules. So when presented with both "include all males" and "exclude all US persons in the 10 - 20 age group," I'm only interested in the males outside of the US, and the males within the US that are outside the 10 - 20 age group.

will became a script like this:

originalSet = getOriginalSet(); //Get all from DB

//elements in originalSet that have gender=male
males = select(originalSet, PARAM.GENDER, GENDER.MALE);

//elements in males that have age in [10,20]
youngMaleGuys = select(males, PARAM.AGE, AGE.10_20);

//Exclude from
males.removeAll(youngMaleGuys);

notYoungUSMaleGuys = select(males, PARAM.COUNTRY, COUNTRY.US);

males.removeAll(notYoungUSMaleGuys);

return resp;

Of course, this sucks. I made it now. But you can write very nice commands and find a way to make this more readable.

Not so fast, but easy to mantain and to read (I think). And you do not have to worrie about ordering

Thats it. I tried. :)

Specular answered 16/9, 2010 at 19:48 Comment(0)
C
0

One of the most powerful Java-based production rules engines (inference engine) is JBoss DROOLS.

http://jboss.org/drools

I'll be honest though, unless your application get a LOT more complicated, using a rules engine is WAY overkill. On the other hand, if you application gets too big and has too many conflicting rules, then it will fail to provide a result.

If you can control your customer or problem domain better, it would be better to avoid inference engines altogether.

Cristie answered 23/9, 2010 at 12:27 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.