Reusing MySQL parser
Asked Answered
H

2

7

I'm working on SQL intrusion detection system (IDS) and I need do parse incoming SQL queries. Writing own SQL parser is a long term task and it will never exactly reflect the logic used in native parser. I found out that MySQL has a lexical analyzer with main source file sql/sql_lex.cc and a syntax analyzer built with bison from sql/sql_yacc.y. I am really interested in reusing this robust solutions. I am building my IDS in C/C++, so I am looking for some way to connect MySQL parser with my detection system.

I was wondering if It is possible to reuse the MySQL parser (lexical+syntax analyzer) to get the structure of SQL query in some logical form e.g. syntax tree. Would it be possible? Are there some related text, tutorials or projects?

Thanks

Haematic answered 13/4, 2014 at 2:58 Comment(2)
Just curious: for your system to work - does it need a server to log all the queries performed? Or will it just act as a mysql proxy?Giza
The system should act as a mysql proxy. I dont plan to touch the SQL server in any way. But it is not so important where do the queries come from at this point. The main problem to solve is how to tranform SQL string to syntax tree.Haematic
H
9

I have finished the first version of my IDS as a part of my bachelor project. It is implemented as plugin for MySQL.

I will list my main sources for understanding the MySQL internals bellow. Then I shortly describe the approach I used in my IDS.

MySQL documentation texts

  • I found books Expert MySQL by Charles Bell and Understanding MySQL Internals by Sasha Pachev (as user3822447 wrote) to be very good entry point for understanding the internals of MySQL.
  • MySQL 5.1 Plugin Development by Andrew Hutchings & Sergei Golubchik is also very useful.
  • The MySQL Internals Manual also contains some basic information good to start with.
  • After all reading I did som debugging (using VS) and discovered how the query tree structure looks like.

My solution for IDS

The source code of my solution can be found at sourceforge. I'm planning to document it little more in its wiki.

The main entry point is the audit_ids_notify() function in audit_ids.cc. The plugin takes query tree generated by internal MySQL parser a makes simplified version of it (to save memory). Then it does anomally detection - it has a list of known query tree structures and keeps some statistical information about each parametrizable part of each query tree structure. The output is written into special log file in the MySQL data directory.

I tried to make the solution modular and extendable. The initial version is kind of demostration and the performance is not optimized especially in SQL storage module.

MySQL plugin type

I identified 2 possible approaches and used the first one.

  1. audit plugin
    • The type of wrapper in my solution plugin is audit plugin.
    • I used this type of plugins despite being used to report server operations (e.g. to log queries or errors).
    • I chose this type of plugin because I found out that this is the only native supported plugin which is called when the query tree is after the completed (i.e. parsed) and before it is freed from the memory (for MySQL 5.6.17).
    • Disadvantage: the above is not fully guaranteed in future versions of MySQL, but in my opinion this shouldn't change in the close future.
    • Advantage: the MySQL does not need to recompiled. It is enough to build and install the plugin.

  1. query-rewrite plugin
    • There is also an alternative approach doing this using non-native plugin type query-rewrite. It provised plugin API for modifying the query thus also for reading it.
    • Disadvantage: to support this plugin API the MySQL server must be recompiled with the API. I think may become a part of MySQL production distribution.
    • Advantage: plugin type designed for reading/rewriting the internal query tree.

If there are some questions/problems related to this topic I could answer feel free to ask ;)

Haematic answered 23/10, 2014 at 19:36 Comment(1)
Hi bittomix, thanks for the information. Is it correct to interpret your answer as by implementing the IDS as a MySQL plugin, you don't need to reuse the MySQL parser as described in your question?Smile
G
1

I believe that it is possible. Try an advanced MySQL internals book such as "Expert MySQL" by Charles Bell or "Understanding MySQL Internals" by Sasha Pachev. MySQL uses a custom hand-built lexer and a generic Bison-compatible parser with which their lexer is compatible.

Aside from that, you may find a simpler solution than parsing the query, for example:

  • Strategy #1: Throw away the query and just look at the contents of the strings inside the query. Look for possible attack vectors such as SQL keywords. This could detect attack attempts.
  • Strategy #2: Throw away all user input and make a list of the rest of the query contents. Make a list of all your query patterns of keywords and compare them against each other. Look for queries with anomalous structure that indicate that someone successfully modified the query.

I am no SQL guru but the most basic strategy is simply to use parameterized queries and ignore penetration attempts. Most such attempts on the Internet are generic, random queries designed to probe for obvious weakness and can be safely ignored if you follow basic security practice everywhere.

Golden answered 14/9, 2014 at 21:19 Comment(1)
Thank you for you response. The first version of my IDS was finished in May (as part of my bachelor project). I used the same two books as you mentioned and I can recommend them, too. I also agree about the usage of parametrized queries and I'd found a way how get the internal query tree generated by MySQL internal parser using MySQL audit module (I will post this in answer).Haematic

© 2022 - 2024 — McMap. All rights reserved.