Detect SQL injections in the source code

Asked 10/12, 2014 at 13:37 Answered 11/12, 2014 at 3:46

Solved python sql security sql-injection static-code-analysis

Consider the following code snippet:

import MySQLdb

def get_data(id):
    db = MySQLdb.connect(db='TEST')
    cursor = db.cursor()
    cursor.execute("SELECT * FROM TEST WHERE ID = '%s'" % id)

    return cursor.fetchall()

print(get_data(1))

There is a major problem in the code - it is vulnerable to SQL injections attacks since the query is not parameterized through DB API and is constructed via string formatting. If you call the function this way:

get_data("'; DROP TABLE TEST -- ")

the following query would be executed:

SELECT * FROM TEST WHERE ID = ''; DROP TABLE TEST --

Now, my goal is to analyze the code in the project and detect all places potentially vulnerable to SQL injections. In other words, where the query is constructed via string formatting as opposed to passing query parameters in a separate argument.

Is it something that can be solved statically, with the help of pylint, pyflakes or any other static code analysis packages?

I'm aware of sqlmap popular penetration testing tool, but, as far as I understand, it is working against a web resource, testing it as a black-box through HTTP requests.

Latisha answered 10/12, 2014 at 13:37 Comment(0)

There is a tool that tries to solve exactly what the question is about, py-find-injection:

py_find_injection uses various heuristics to look for SQL injection vulnerabilities in python source code.

It uses ast module, looks for session.execute() and cursor.execute() calls, and checks whether the query inside is formed via string interpolation, concatenation or format().

Here is what it outputs while checking the snippet in the question:

$ py-find-injection test.py
test.py:6   string interpolation of SQL query
1 total errors

The project, though, is not actively maintained, but could be used as a starting point. A good idea would be to make a pylint or pyflakes plugin out of it.

Latisha answered 10/12, 2014 at 13:50 Comment(1)

Can this library also scan input strings for SQL Injection attacks? – Lozier 24/2, 2021 at 21:29

Not sure how this will compare with the other packages, but to a certain extent you need to parse the arguments being passed to cursor.execute. This bit of pyparsing code looks for:

arguments using string interpolation
arguments using string concatenation with variable names
arguments that are just variable names

But sometimes arguments use string concatenation just to break up a long string into - if all the strings in the expression are literals being added together, there is no risk of SQL injection.

This pyparsing snippet will look for calls to cursor.execute, and then look for the at-risk argument forms:

from pyparsing import *
import re

identifier = Word(alphas, alphanums+'_')
integer = Word(nums)
LPAR,RPAR,PLUS,PERCENT = map(Literal, '()+%')

stringInterpRE = re.compile(r"%-?\d*\*?\.?\d*\*?s")
def containsStringInterpolation(s,l,tokens):
    if not stringInterpRE.search(tokens[0]):
        raise ParseException(s,l,"No string interpolation")
tupleContents = identifier | integer
tupleExpr = LPAR + delimitedList(tupleContents) + RPAR
stringInterpArg = identifier | tupleExpr        
interpolatedString = originalTextFor(quotedString.copy().setParseAction(containsStringInterpolation) + 
                                    PERCENT + stringInterpArg)

stringTerm = interpolatedString | OneOrMore(quotedString.copy()) | identifier
stringTerm.setName("stringTerm")

unsafeStringExpr = (stringTerm + OneOrMore(PLUS + stringTerm)) | identifier | interpolatedString
def unsafeExpr(s,l,tokens):
    if not any(term == interpolatedString or term == identifier
                for term in tokens):
        raise ParseException(s,l,"No unsafe string terms")
unsafeStringExpr.setParseAction(unsafeExpr)
unsafeStringExpr.setName("unsafeExpr")

func = Literal("cursor.execute")
statement = func + LPAR + unsafeStringExpr + RPAR
statement.setName("execute stmt")
#statement.ignore(pythonComment)

for tokens in statement.searchString(sample):
    print ' '.join(tokens.asList())

This will scan through the following sample:

sample = """
import MySQLdb

def get_data(id):
    db = MySQLdb.connect(db='TEST')
    cursor = db.cursor()
    cursor.execute("SELECT * FROM TEST WHERE ID = '%s' -- UNSAFE" % id)
    cursor.execute("SELECT * FROM TEST WHERE ID = '" + id + "' -- UNSAFE")
    cursor.execute(sqlVar + " -- UNSAFE")
    cursor.execute("SELECT * FROM TEST WHERE ID = 'FRED' -- SAFE")
    cursor.execute("SELECT * FROM TEST WHERE ID = " + 
                        "'FRED' -- SAFE")
    cursor.execute("SELECT * FROM TEST "
                        "WHERE ID = "
                        "'FRED' -- SAFE")
    cursor.execute("SELECT * FROM TEST "
                        "WHERE ID = " +
                        "'%s' -- UNSAFE" % name)
    return cursor.fetchall()

print(get_data(1))"""

and report these unsafe statements:

cursor.execute ( "SELECT * FROM TEST WHERE ID = '%s' -- UNSAFE" % id )
cursor.execute ( "SELECT * FROM TEST WHERE ID = '" + id + "' -- UNSAFE" )
cursor.execute ( sqlVar + " -- UNSAFE" )
cursor.execute ( "SELECT * FROM TEST " "WHERE ID = " + "'%s' -- UNSAFE" % name )

You can also have pyparsing report the location of the found lines, using scanString instead of searchString.

Sporocyte answered 11/12, 2014 at 3:46 Comment(3)

Wow, this is amazing, haven't thought of pyparsing at all. I'll definitely study the code to completely understand it. Thank you very much for the option! – Latisha 11/12, 2014 at 3:56

The advent of new string formatting methods (.format, .format_map, and f-strings) will require some additional work to adapt this solution to current Python technologies. – Sporocyte 10/12, 2018 at 17:57

I was recently, looking into this. This is quite helpful. – Louque 20/9, 2023 at 4:8

About the best that I can think you'd get would be grep'ing through your codebase, looking for cursor.execute() statements being passed a string using Python string interpolation, as in your example:

cursor.execute("SELECT * FROM TEST WHERE ID = '%s'" % id)

which of course should have been written as a parameterized query to avoid the vulnerability:

cursor.execute("SELECT * FROM TEST WHERE ID = '%s'", (id,))

That's not going to be perfect -- for instance, you might have a hard time catching code like this:

query = "SELECT * FROM TEST WHERE ID = '%s'" % id
# some stuff
cursor.execute(query)

But it might be about the best you can easily do.

Asperity answered 10/12, 2014 at 13:48 Comment(2)

Greping is an option. And thank you for the last example - I agree - it can be a challenge to find. Though, I guess, there is no "silver bullet" here, I just want to do the best I can. Thanks again, definitely plus one. – Latisha 10/12, 2014 at 13:54

Btw, take a look at my answer - py-find-injection works for your last example also, detects the string interpolation of the query. – Latisha 10/12, 2014 at 13:56

-1

It's a good thing that you're already aware of the problem and trying to resolve it.

As you may already know, the best practices to execute SQL in any DB is to use prepared statements or stored procedures if these are available.

In this particular case, you can implement a prepared statement by "preparing" the statement and then executing.

e.g:

cursor = db.cursor()
query = "SELECT * FROM TEST WHERE ID = %s"  
cur.execute(query, "2")

Saad answered 10/12, 2014 at 15:3 Comment(4)

For future reference: SQL injection is prevented while coding the application and never on compiling, running time or using an external tool to check for flaws. – Saad 10/12, 2014 at 15:5

Thank you for the point, I understand the best practices and follow them; the focus of the topic is about static code analysis. – Latisha 10/12, 2014 at 15:8

As should be clear from the original post, alecxe well understands the nature of SQL injection and how to prevent it. It sounds like he or she has inherited an application and wants to test the source code base to find possible injection flaws. – Lopes 10/12, 2014 at 15:25

Well with that in mind I'd refer to @JoshKupershmidt , and just search the code for execute statements. – Saad 10/12, 2014 at 17:56

Recommended topics

Hot tags