The reason eval
and exec
are so dangerous is that the default compile
function will generate bytecode for any valid python expression, and the default eval
or exec
will execute any valid python bytecode. All the answers to date have focused on restricting the bytecode that can be generated (by sanitizing input) or building your own domain-specific-language using the AST.
Instead, you can easily create a simple eval
function that is incapable of doing anything nefarious and can easily have runtime checks on memory or time used. Of course, if it is simple math, than there is a shortcut.
c = compile(stringExp, 'userinput', 'eval')
if c.co_code[0]==b'd' and c.co_code[3]==b'S':
return c.co_consts[ord(c.co_code[1])+ord(c.co_code[2])*256]
The way this works is simple, any constant mathematic expression is safely evaluated during compilation and stored as a constant. The code object returned by compile consists of d
, which is the bytecode for LOAD_CONST
, followed by the number of the constant to load (usually the last one in the list), followed by S
, which is the bytecode for RETURN_VALUE
. If this shortcut doesn't work, it means that the user input isn't a constant expression (contains a variable or function call or similar).
This also opens the door to some more sophisticated input formats. For example:
stringExp = "1 + cos(2)"
This requires actually evaluating the bytecode, which is still quite simple. Python bytecode is a stack oriented language, so everything is a simple matter of TOS=stack.pop(); op(TOS); stack.put(TOS)
or similar. The key is to only implement the opcodes that are safe (loading/storing values, math operations, returning values) and not unsafe ones (attribute lookup). If you want the user to be able to call functions (the whole reason not to use the shortcut above), simple make your implementation of CALL_FUNCTION
only allow functions in a 'safe' list.
from dis import opmap
from Queue import LifoQueue
from math import sin,cos
import operator
globs = {'sin':sin, 'cos':cos}
safe = globs.values()
stack = LifoQueue()
class BINARY(object):
def __init__(self, operator):
self.op=operator
def __call__(self, context):
stack.put(self.op(stack.get(),stack.get()))
class UNARY(object):
def __init__(self, operator):
self.op=operator
def __call__(self, context):
stack.put(self.op(stack.get()))
def CALL_FUNCTION(context, arg):
argc = arg[0]+arg[1]*256
args = [stack.get() for i in range(argc)]
func = stack.get()
if func not in safe:
raise TypeError("Function %r now allowed"%func)
stack.put(func(*args))
def LOAD_CONST(context, arg):
cons = arg[0]+arg[1]*256
stack.put(context['code'].co_consts[cons])
def LOAD_NAME(context, arg):
name_num = arg[0]+arg[1]*256
name = context['code'].co_names[name_num]
if name in context['locals']:
stack.put(context['locals'][name])
else:
stack.put(context['globals'][name])
def RETURN_VALUE(context):
return stack.get()
opfuncs = {
opmap['BINARY_ADD']: BINARY(operator.add),
opmap['UNARY_INVERT']: UNARY(operator.invert),
opmap['CALL_FUNCTION']: CALL_FUNCTION,
opmap['LOAD_CONST']: LOAD_CONST,
opmap['LOAD_NAME']: LOAD_NAME
opmap['RETURN_VALUE']: RETURN_VALUE,
}
def VMeval(c):
context = dict(locals={}, globals=globs, code=c)
bci = iter(c.co_code)
for bytecode in bci:
func = opfuncs[ord(bytecode)]
if func.func_code.co_argcount==1:
ret = func(context)
else:
args = ord(bci.next()), ord(bci.next())
ret = func(context, args)
if ret:
return ret
def evaluate(expr):
return VMeval(compile(expr, 'userinput', 'eval'))
Obviously, the real version of this would be a bit longer (there are 119 opcodes, 24 of which are math related). Adding STORE_FAST
and a couple others would allow for input like 'x=5;return x+x
or similar, trivially easily. It can even be used to execute user-created functions, so long as the user created functions are themselves executed via VMeval (don't make them callable!!! or they could get used as a callback somewhere). Handling loops requires support for the goto
bytecodes, which means changing from a for
iterator to while
and maintaining a pointer to the current instruction, but isn't too hard. For resistance to DOS, the main loop should check how much time has passed since the start of the calculation, and certain operators should deny input over some reasonable limit (BINARY_POWER
being the most obvious).
While this approach is somewhat longer than a simple grammar parser for simple expressions (see above about just grabbing the compiled constant), it extends easily to more complicated input, and doesn't require dealing with grammar (compile
take anything arbitrarily complicated and reduces it to a sequence of simple instructions).