Creating a DSL expressions parser / rules engine
Asked Answered
B

7

9

I'm building an app which has a feature for embedding expressions/rules in a config yaml file. So for example user can reference a variable defined in yaml file like ${variables.name == 'John'} or ${is_equal(variables.name, 'John')}. I can probably get by with simple expressions but I want to support complex rules/expressions such ${variables.name == 'John'} and (${variables.age > 18} OR ${variables.adult == true})

I'm looking for a parsing/dsl/rules-engine library that can support these type of expressions and normalize it. I'm open using ruby, javascript, java, or python if anyone knows of a library for that languages.

One option I thought of was to just support javascript as conditons/rules and basically pass it through eval with the right context setup with access to variables and other reference-able vars.

Baskerville answered 29/6, 2020 at 16:58 Comment(7)
Please read What topics can I ask about here?. This question's probably more appropriate for another site, like Software Recommendations perhapsLyonnaise
You might want to investigate using ANTLR4. I've created many DSLs using it.Spline
Look at github.com/antlr/antlr4/blob/master/doc/javascript-target.md on getting to use ANTLRv4 with Javascript.Stovepipe
In JavaScript I would consider Nearley.js - See this answer of mine in case it's relevant or helpful.Zwolle
For a DSLs that are largely expression oriented, you can code your own parser pretty easily, even in JS. See my answer on how to build recursive descent parsers by hand: https://mcmap.net/q/205973/-writing-a-parser-like-flex-bison-that-is-usable-on-8-bit-embedded-systemsArboretum
Java and Spring have powerful Expression Languages you could just start to use.Platitude
What operating system do you have in mind? Things would be different on Linux, Windows or FreeBSD....Supposition
N
3

I don't know if you use Golang or not, but if you use it, I recommend this https://github.com/antonmedv/expr.

I have used it for parsing bot strategy that (stock options bot). This is from my test unit:

func TestPattern(t *testing.T) {
    a := "pattern('asdas asd 12dasd') && lastdigit(23asd) < sma(50) && sma(14) > sma(12) && ( macd(5,20) > macd_signal(12,26,9) || macd(5,20) <= macd_histogram(12,26,9) )"

    r, _ := regexp.Compile(`(\w+)(\s+)?[(]['\d.,\s\w]+[)]`)
    indicator := r.FindAllString(a, -1)
    t.Logf("%v\n", indicator)
    t.Logf("%v\n", len(indicator))

    for _, i := range indicator {
        t.Logf("%v\n", i)
        if strings.HasPrefix(i, "pattern") {
            r, _ = regexp.Compile(`pattern(\s+)?\('(.+)'\)`)
            check1 := r.ReplaceAllString(i, "$2")
            t.Logf("%v\n", check1)
            r, _ = regexp.Compile(`[^du]`)
            check2 := r.FindAllString(check1, -1)
            t.Logf("%v\n", len(check2))
        } else if strings.HasPrefix(i, "lastdigit") {
            r, _ = regexp.Compile(`lastdigit(\s+)?\((.+)\)`)
            args := r.ReplaceAllString(i, "$2")
            r, _ = regexp.Compile(`[^\d]`)
            parameter := r.FindAllString(args, -1)
            t.Logf("%v\n", parameter)
        } else {

        }
    }
}

Combine it with regex and you have good (if not great, string translator).

And for Java, I personally use https://github.com/ridencww/expression-evaluator but not for production. It has similar feature with above link.

It supports many condition and you don't have to worry about Parentheses and Brackets.

Assignment  =
Operators   + - * / DIV MOD % ^ 
Logical     < <= == != >= > AND OR NOT
Ternary     ? :  
Shift       << >>
Property    ${<id>}
DataSource  @<id>
Constants   NULL PI
Functions   CLEARGLOBAL, CLEARGLOBALS, DIM, GETGLOBAL, SETGLOBAL
            NOW PRECISION

Hope it helps.

Nuggar answered 2/7, 2020 at 19:38 Comment(0)
H
2

You might be surprised to see how far you can get with a syntax parser and 50 lines of code!

Check this out. The Abstract Syntax Tree (AST) on the right represents the code on the left in nice data structures. You can use these data structures to write your own simple interpreter.

I wrote a little example of one: https://codesandbox.io/s/nostalgic-tree-rpxlb?file=/src/index.js

Open up the console (button in the bottom), and you'll see the result of the expression!

This example can only handle (||) and (>), but looking at the code (line 24), you can see how you could make it support any other JS operator. Just add a case to the branch, evaluate the sides, and do the calculation on JS.

Parenthesis and operator precedence are all handled by the parser for you.

I'm not sure if this is the solution for you, but it will for sure be fun ;)

Hydrodynamic answered 5/7, 2020 at 23:16 Comment(0)
A
1

One option I thought of was to just support javascript as conditons/rules and basically pass it through eval with the right context setup with access to variables and other reference-able vars.

I would personally lean towards something like this. If you are getting into complexities such as logic comparisons, a DSL can become a beast since you are basically almost writing a compiler and a language at that point. You might want to just not have a config, and instead have the configurable file just be JavaScript (or whatever language) that can then be evaluated and then loaded. Then whoever your target audience is for this "config" file can just supplement logical expressions as needed.

The only reason I would not do this is if this configuration file was being exposed to the public or something, but in that case security for a parser would also be quite difficult.

Affixation answered 3/7, 2020 at 0:0 Comment(0)
S
0

I'm building an app which has a feature for embedding expressions/rules in a config yaml file.

I'm looking for a parsing/dsl/rules-engine library that can support these type of expressions and normalize it. I'm open using ruby, javascript, java, or python if anyone knows of a library for that languages.

One possibility might be to embed a rule interpreter such as ClipsRules inside your application. You could then code your application in C++ (perhaps inspired by my clips-rules-gcc project) and link to it some C++ YAML library such as yaml-cpp.

Another approach could be to embed some Python interpreter inside a rule interpreter (perhaps the same ClipsRules) and some YAML library.

A third approach could be to use Guile (or SBCL or Javascript v8) and extend it with some "expert system shell".

Before starting to code, be sure to read several books such as the Dragon Book, the Garbage Collection handbook, Lisp In Small Pieces, Programming Language Pragmatics. Be aware of various parser generators such as ANTLR or GNU bison, and of JIT compilation libraries like libgccjit or asmjit.

You might need to contact a lawyer about legal compatibility of various open source licenses.

Supposition answered 8/7, 2020 at 8:55 Comment(0)
C
0

Some toughs and things you should consider.

1. Unified Expression Language (EL),

Another option is EL, specified as part of the JSP 2.1 standard (JSR-245). Official documentation.

They have some nice examples that can give you a good overview of the syntax. For example:

   El Expression: `${100.0 == 100}` Result=  `true`   
   El Expression: `${4 > 3}`        Result=  `true` 

You can use this to evaluate small script-like expressions. And there are some implementations: Juel is one open source implementation of the EL language.

2. Audience and Security

All the answers recommend using different interpreters, parser generators. And all are valid ways to add functionality to process complex data. But I would like to add an important note here.

Every interpreter has a parser, and injection attacks target those parsers, tricking them to interpret data as commands. You should have a clear understanding how the interpreter's parser works, because that's the key to reduce the chances to have a successful injection attack Real world parsers have many corner cases and flaws that may not match the specs. And have clear the measures to mitigate possible flaws.

And even if your application is not facing the public. You can have external or internal actors that can abuse this feature.

Cotquean answered 8/7, 2020 at 15:36 Comment(0)
B
0

I did something like that once, you can probably pick it up and adapt it to your needs.

TL;DR: thanks to Python's eval, you doing this is a breeze.

The problem was to parse dates and durations in textual form. What I did was to create a yaml file mapping regex pattern to the result. The mapping itself was a python expression that would be evaluated with the match object, and had access to other functions and variables defined elsewhere in the file.

For example, the following self-contained snippet would recognize times like "l'11 agosto del 1993" (Italian for "August 11th, 1993,).

__meta_vars__:
  month: (gennaio|febbraio|marzo|aprile|maggio|giugno|luglio|agosto|settembre|ottobre|novembre|dicembre)
  prep_art: (il\s|l\s?'\s?|nel\s|nell\s?'\s?|del\s|dell\s?'\s?)
  schema:
    date: http://www.w3.org/2001/XMLSchema#date

__meta_func__:
  - >
    def month_to_num(month):
        """ gennaio -> 1, febbraio -> 2, ..., dicembre -> 12 """
        try:
            return index_in_or(meta_vars['month'], month) + 1
        except ValueError:
            return month

Tempo:
  - \b{prep_art}(?P<day>\d{{1,2}}) (?P<month>{month}) {prep_art}?\s*(?P<year>\d{{4}}): >
      '"{}-{:02d}-{:02d}"^^<{schema}>'.format(match.group('year'),
                                              month_to_num(match.group('month')),
                                              int(match.group('day')),
                                              schema=schema['date'])

__meta_func__ and __meta_vars (not the best names, I know) define functions and variables that are accessible to the match transformation rules. To make the rules easier to write, the pattern is formatted by using the meta-variables, so that {month} is replaced with the pattern matching all months. The transformation rule calls the meta-function month_to_num to convert the month to a number from 1 to 12, and reads from the schema meta-variable. On the example above, the match results in the string "1993-08-11"^^<http://www.w3.org/2001/XMLSchema#date>, but some other rules would produce a dictionary.

Doing this is quite easy in Python, as you can use exec to evaluate strings as Python code (obligatory warning about security implications). The meta-functions and meta-variables are evaluated and stored in a dictionary, which is then passed to the match transformation rules.

The code is on github, feel free to ask any questions if you need clarifications. Relevant parts, slightly edited:

class DateNormalizer:
    def _meta_init(self, specs):
        """ Reads the meta variables and the meta functions from the specification
        :param dict specs: The specifications loaded from the file
        :return: None
        """
        self.meta_vars = specs.pop('__meta_vars__')

        # compile meta functions in a dictionary
        self.meta_funcs = {}
        for f in specs.pop('__meta_funcs__'):
            exec f in self.meta_funcs

        # make meta variables available to the meta functions just defined
        self.meta_funcs['__builtins__']['meta_vars'] = self.meta_vars

        self.globals = self.meta_funcs
        self.globals.update(self.meta_vars)

    def normalize(self, expression):
        """ Find the first matching part in the given expression
        :param str expression: The expression in which to search the match
        :return: Tuple with (start, end), category, result
        :rtype: tuple
        """
        expression = expression.lower()
        for category, regexes in self.regexes.iteritems():
            for regex, transform in regexes:
                match = regex.search(expression)
                if match:
                    result = eval(transform, self.globals, {'match': match})
                    start, end = match.span()
                    return (first_position + start, first_position + end) , category, result
Bosun answered 8/7, 2020 at 15:49 Comment(0)
S
0

Here are some categorized Ruby options and resources:

Insecure

  1. Pass expression to eval in the language of your choice.

It must be mentioned that eval is technically an option, but extraordinary trust must exist in its inputs and it is safer to avoid it altogether.

Heavyweight

  1. Write a parser for your expressions and an interpreter to evaluate them

A cost-intensive solution would be implementing your own expression language. That is, to design a lexicon for your expression language, implement a parser for it, and an interpreter to execute the code that's parsed.

Some Parsing Options (ruby)

Medium Weight

  1. Pick an existing language to write expressions in and parse / interpret those expressions.

This route assumes you can pick a known language to write your expressions in. The benefit is that a parser likely already exists for that language to turn it into an Abstract Syntax Tree (data structure that can be walked for interpretation).

A ruby example with the Parser gem

require 'parser'

class MyInterpreter
  # https://whitequark.github.io/ast/AST/Processor/Mixin.html
  include ::Parser::AST::Processor::Mixin

  def on_str(node)
    node.children.first
  end

  def on_int(node)
    node.children.first.to_i
  end

  def on_if(node)
    expression, truthy, falsey = *node.children
    if process(expression)
      process(truthy)
    else
      process(falsey)
    end
  end

  def on_true(_node)
    true
  end

  def on_false(_node)
    false
  end

  def on_lvar(node)
    # lookup a variable by name=node.children.first
  end

  def on_send(node, &block)
    # allow things like ==, string methods? whatever
  end

  # ... etc
end

ast = Parser::ConcurrentRuby.parse(<<~RUBY)
  name == 'John' && adult
RUBY
MyParser.new.process(ast)
# => true

The benefit here is that a parser and syntax is predetermined and you can interpret only what you need to (and prevent malicious code from executing by controller what on_send and on_const allow).

Templating

This is more markup-oriented and possibly doesn't apply, but you could find some use in a templating library, which parses expressions and evaluates for you. Control and supplying variables to the expressions would be possible depending on the library you use for this. The output of the expression could be checked for truthiness.

Shandrashandrydan answered 8/7, 2020 at 20:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.