Restricting Python's syntax to execute user code safely. Is this a safe approach?

Asked 18/5, 2012 at 23:46 Answered 20/5, 2012 at 5:37

Solved python security abstract-syntax-tree

Original question:

Executing mathematical user code on a python web server, what is the simplest secure way?

I want to be able to run user submitted code on a python webserver. The code will be simple and mathematical in nature.

As such a small subset of Python is required, my current approach is to whitelist allowable syntax by traversing Python's abstract syntax tree. Functions and names get special treatment; only explicitly whitelisted functions are allowed, and only unused names.

import ast

allowed_functions = set([
    #math library
    'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh',
    'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf',
    'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod',
    'frexp', 'fsum', 'gamma', 'hypot', 'isinf', 'isnan', 'ldexp',
    'lgamma', 'log', 'log10', 'log1p', 'modf', 'pi', 'pow', 'radians',
    'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc',
    #builtins
    'abs', 'max', 'min', 'range', 'xrange'
    ])

allowed_node_types = set([
    #Meta
    'Module', 'Assign', 'Expr',
    #Control
    'For', 'If', 'Else',
    #Data
    'Store', 'Load', 'AugAssign', 'Subscript',
    #Datatypes
    'Num', 'Tuple', 'List',
    #Operations
    'BinOp', 'Add', 'Sub', 'Mult', 'Div', 'Mod', 'Compare'
    ])

safe_names = set([
    'True', 'False', 'None'
    ])


class SyntaxChecker(ast.NodeVisitor):

    def check(self, syntax):
        tree = ast.parse(syntax)
        self.visit(tree)

    def visit_Call(self, node):
        if node.func.id not in allowed_functions:
            raise SyntaxError("%s is not an allowed function!"%node.func.id)
        else:
            ast.NodeVisitor.generic_visit(self, node)

    def visit_Name(self, node):
        try:
            eval(node.id)
        except NameError:
            ast.NodeVisitor.generic_visit(self, node)
        else:
            if node.id not in safe_names and node.id not in allowed_functions:
                raise SyntaxError("%s is a reserved name!"%node.id)
            else:
                ast.NodeVisitor.generic_visit(self, node)

    def generic_visit(self, node):
        if type(node).__name__ not in allowed_node_types:
            raise SyntaxError("%s is not allowed!"%type(node).__name__)
        else:
            ast.NodeVisitor.generic_visit(self, node)

if __name__ == '__main__':
    x = SyntaxChecker()
    while True:
        try:
            x.check(raw_input())
        except Exception as e:
            print e

This seems to accept the required syntax, but I am reasonably new to programming and could be missing any number of gaping security holes.

So my questions are: Is this secure, is there a better approach, and are there any other precautions I should be taking?

Census answered 18/5, 2012 at 23:46 Comment(6)

I looks pretty safe to me... But one side note: the names in your script are somewhat leaking into the sandbox. If I test x it says x is a reserved name but if I test y it says name 'y' not defined. – Doxia 19/5, 2012 at 0:0

What kind of safety mechanisms does the webserver have for running python scripts? – Acrylonitrile 19/5, 2012 at 0:7

@rodrigo: True! In deployment I should hopefully have it running in it's own thread to isolate it from the other names. – Census 19/5, 2012 at 0:23

@Joel: No other safety measures, it is a very very basic project written in web.py (hence why I wanted a simple pythonic solution to running the scripts safely) – Census 19/5, 2012 at 0:25

So, I could do something like "factorial(1000000)"? – Acrylonitrile 19/5, 2012 at 0:29

Ah, sorry the thread will have a timeout. – Census 19/5, 2012 at 0:34

Two points I noticed that you could still improve:

You should always escape any output that can be generated from some form of user input. In your example, the unallowed identifiers get mirrored unmodified back to the output. This could potentially be exploited, one example being Cross Site Scripting. Therefore I would additionally escape any such error message to prevent this.

Another thing you need to be aware of is Denial-of-Service attacks. Imagine someone whips up an Ackermann function and a script to submit it a couple of thousand times to your server... To prevent this, you should timebox the execution time of any code being submitted. This is essential, because this type of "attack" often happens unintentionally - someone managed to produce an infinite loop.

Edit:

Finally, I would also recommend to update your Python version to prevent a "hashDoS" attack.

Izolaiztaccihuatl answered 19/5, 2012 at 0:23 Comment(6)

Thanks, I'm happy you could only see two! When the code is checked and executed it will be in it's own thread with a timeout. I had no idea about cross site scripting though, I'll read up on it – Census 19/5, 2012 at 0:30

I tried looking this up but didn't find much. Are python hash collisions a realistic exploit? – Acrylonitrile 19/5, 2012 at 0:32

Ah yes, cross site scripting is a vulnerability - will fix – Census 19/5, 2012 at 0:46

Cross site scripting shouldn't be a vulnerability, as ast.parse will only parse valid names (ie letters, numbers and underscores), so any attempt at scripting should result in a "Syntax error at line x" error :) - I'll be sure to update for the hash collisions – Census 19/5, 2012 at 1:45

Still, I would escape the output. I've seen very clever pieces of code that managed to circumvent assumptions that were made. It's dangerous and anyone will tell you the same: don't make assumptions about what an attacker can or can't do - always go the safe route. Better safe than sorry :) – Izolaiztaccihuatl 19/5, 2012 at 1:49

@Izolaiztaccihuatl Novice question: Once you run an equation string through the function checker, what's the best way to actually solve the equation? Is the string sanitized enough to just use eval on it or is there a safer method? – Conti 29/9, 2018 at 23:50

Have you looked at pypy's sandboxing features? It is reputedly much safer than any CPython sandboxing efforts. You can even limit the heap size and cpu execution time to prevent denial of service.

Underbodice answered 20/5, 2012 at 3:5 Comment(2)

I have thanks, see the original question. In this case I was seeking a simple, pythonic solution over the best possible security. – Census 20/5, 2012 at 3:18

Aaah, I see now. I did look at the original question but only checked the answers -- looks like someone recommended PyPy in the comments instead. All right, as long as you know that the PyPy option is available, I wish you luck with the whitelisting approach! – Underbodice 20/5, 2012 at 3:23

The Openerp's source code contains a safe_eval.py that do a similar thing. But Instead of checking the ast of the source, it restrict the byte code that is allowed to execute. I think you may also have a look on it :)

Inspection answered 20/5, 2012 at 5:37 Comment(0)

Recommended topics

Hot tags