math syntax checker written in python
Asked Answered
S

5

10

All I need is to check, using python, if a string is a valid math expression or not.

For simplicity let's say I just need + - * / operators (+ - as unary too) with numbers and nested parenthesis. I add also simple variable names for completeness.

So I can test this way:

test("-3 * (2 + 1)") #valid
test("-3 * ")        #NOT valid

test("v1 + v2")      #valid
test("v2 - 2v")      #NOT valid ("2v" not a valid variable name)

I tried pyparsing but just trying the example: "simple algebraic expression parser, that performs +,-,*,/ and ^ arithmetic operations" I get passed invalid code and also trying to fix it I always get wrong syntaxes being parsed without raising Exceptions

just try:

>>>test('9', 9)
9 qwerty = 9.0 ['9'] => ['9']
>>>test('9 qwerty', 9)
9 qwerty = 9.0 ['9'] => ['9']

both test pass... o_O

Any advice?

Seger answered 3/2, 2011 at 14:50 Comment(0)
L
3

This is because the pyparsing code allows functions. (And by the way, it does a lot more than what you need, i.e. create a stack and evaluate that.)

For starters, you could remove pi and ident (and possibly something else I'm missing right now) from the code to disallow characters.

The reason is different: PyParsing parsers won't try to consume the whole input by default. You have to add + StringEnd() (and import it, of course) to the end of expr to make it fail if it can't parse the whole input. In that case, pyparsing.ParseException will be raised. (Source: http://pyparsing-public.wikispaces.com/FAQs)

If you care to learn a bit of parsing, what you need can propably be built in less than thirty lines with any decent parsing library (I like LEPL).

Leathaleather answered 3/2, 2011 at 15:9 Comment(5)
not true since pi... is pi and not querty and ident comes only followed by parenthesis... Of course if I could get pyparsing to work as a valid syntax checker I'd like it. I'll give LEPL a chance too.Seger
@neuriono: Then either the source code is misleading and the grammar is actually different, or pyparsing is broken (edit: one explanation I can think of, which would be in the category "pyparsing is broken": It doesn't consume the whole string but rather exits and returns what it parsed so far if the remaining input fails to parse).Leathaleather
well this is quite obvious, but if you look at the part of code that builds the parser (def BNF()) is quite simple and even removing things like exponentiation part making it even simpler it still fails so I guess pyparsing is not good in checking syntax.Seger
@neuriono: My guess was right. Added cause and fix to the answer.Leathaleather
or add parseAll=True... Thanks for pointing this out, I'll see if I can really get it to check my syntax and give you the best answerSeger
P
2

Adding parseAll=True to the call to parseString will convert this parser into a validator.

Promethium answered 3/2, 2011 at 23:14 Comment(1)
Well, a bit late... (see my last comment to selected answer), thanks anyway.Seger
H
1

Why not just evaluate it and catch the syntax error?

from math import *

def validateSyntax(expression):
  functions = {'__builtins__': None}
  variables = {'__builtins__': None}

  functions = {'acos': acos,
               'asin': asin,
               'atan': atan,
               'atan2': atan2,
               'ceil': ceil,
               'cos': cos,
               'cosh': cosh,
               'degrees': degrees,
               'exp': exp,
               'fabs':fabs,
               'floor': floor,
               'fmod': fmod,
               'frexp': frexp,
               'hypot': hypot,
               'ldexp': ldexp,
               'log': log,
               'log10': log10,
               'modf': modf,
               'pow': pow,
               'radians': radians,
               'sin': sin,
               'sinh': sinh,
               'sqrt': sqrt,
               'tan': tan,
               'tanh': tanh}

  variables = {'e': e, 'pi': pi}

  try:
    eval(expression, variables, functions)
  except (SyntaxError, NameError, ZeroDivisionError):
    return False
  else:
    return True

Here are some samples:

> print validSyntax('a+b-1') # a, b are undefined, so a NameError arises.
> False

> print validSyntax('1 + 2')
> True

> print validSyntax('1 - 2')
> True

> print validSyntax('1 / 2')
> True

> print validSyntax('1 * 2')
> True

> print validSyntax('1 +/ 2')
> False

> print validSyntax('1 + (2')
> False

> print validSyntax('import os')
> False

> print validSyntax('print "asd"')
> False

> print validSyntax('import os; os.delete("~\test.txt")')
> False # And the file was not removed

It's restricted to only mathematical operations, so it should work a bit better than a crude eval.

Hogen answered 3/2, 2011 at 15:14 Comment(15)
This is much worse than the first (now deleted) answer, which at least checked if the answer consists of only numbers and operators. Yours allows abritary code :(Leathaleather
literal_eval is not the answer, as you want to allow math operators and parens.Leathaleather
One more update: I've changed the source of literal_eval so that it only accepts binary and unary operations (hopefully it's clean now).Hogen
So much work and code... ever thought about just going that other guy's way (checking for numbers+ops) or doing it properly and building a parser?Leathaleather
Your first sample is still wrong for my needs... the expression is valid syntax, if variables are not defined that's a NameError, not a SyntaxError...Seger
Seven lines and -20 from the source? That's nothing! And who knows, maybe my solution will work better in the long run if you are planning on adding more complex mathematical syntax checking.Hogen
I'm working on it, sheesh. You're acting like I'm doing a job wrong. Be happy that you're even getting help.Hogen
@Blender: Even if requirements expand, a solution using a parsing library will be adjusted easily. No need to hand-roll the solution.Leathaleather
@Blender: I'm glad for everybody help but I give it as an assumption that giving syntax to an evaluator is the wrong and possibly unsecure way. Thanks for your efforts anyway.Seger
Okay, this update should work. Correct me if I'm wrong, but I think it's pretty safe to use an eval() in this case.Hogen
Perhaps safe, but still way too much code. Even debatable if we were going to evaluate the expression, but absolutely overkill (and also still dirty) if the problem is syntax checking.Leathaleather
I can format it a bit better, but why is it dirty? Find me a hole and I'll be happy.Hogen
You know trying eval("100010001000") will hang your python for long? I hope this code not running on a webserver...Seger
I see what you mean... Let me see what I can do.Hogen
I think this would be vulnerable to race conditions if multithreaded.Wildwood
P
1

You could try building a simple parser yourself to tokenize the string of the arithmetic expression and then build an expression tree, if the tree is valid (the leaves are all operands and the internal nodes are all operators) then you can say that the expression is valid.

The basic concept is to make a few helper functions to create your parser.

def extract() will get the next character from the expression
def peek() similar to extract but used if there is no whitespace to check the next character
get_expression()
get_next_token()

Alternatively if you can guarantee whitespace between characters you could use split() to do all the tokenizing.

Then you build your tree and evaluate if its structured correctly

Try this for more info: http://effbot.org/zone/simple-top-down-parsing.htm

Purgatory answered 3/2, 2011 at 15:21 Comment(7)
Building all this yourself is sooo last century... these days, you use a parsing library which takes care of all the nasty bureaucracy.Leathaleather
@delnan I added the fact that if there is whitespace you can just use split(), also if there is no such library out there that meets your needs (functional but not too big etc...), what then?Purgatory
@Yoel: Then you're out of luck and propably have too high standards.Leathaleather
@delnan I don't understand someone had to write the library LEPL that you said you liked to use. I guess call me old fashioned for wanting to do things that are "sooo last century" ;)Purgatory
@Yoel: I assume you already parsed something nontrivial (CSV is at the borderline between trivial and simple) by hand? I once was about to, but halfway through I realized that I was writing helper functions, utilities, etc. that parsing libraries already provide (not to mention that my code was still buggy while theirs worked flawlessly).Leathaleather
@delnan: +1 for not letting this chain of comments get aggressive. I understand where your coming from and yeah using libraries that are tried and true is very useful. I just felt for this specific application that @Seger needed, it might make sense to go the custom route. I guess I was jumping to conclusions about what restraints there was on space and dependenciesPurgatory
@Yoel: Well, since OP presented a pyparsing solution he wanted to get working, dependencies seem fine. But nevermind.Leathaleather
I
0

If you are interested in modifying a custom math evaluator engine written in Python so that it is a validator instead, you could start out with Evaluator 2.0 (Python 3.x) and Math_Evaluator (Python 2.x). They are not ready-made solutions but would allow you to fully customize whatever it is you are trying to do exactly using (hopefully) easy-to-read Python code. Note that "and" & "or" are treated as operators.

Ildaile answered 4/2, 2011 at 2:31 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.