How to parse / tokenize an SQL statement in Node.js [closed]
Asked Answered
S

4

21

I'm looking for a way to parse / tokenize SQL statement within a Node.js application, in order to:

  • Tokenize all the "basics" SQL keywords defined in the ISO/IEC 9075 standard or here.
  • Validate the SQL syntax.
  • Find out what the query is gonna do (e.g. read or write?).

Do you have any solution or advises peeps?

Linked: Any Javascript/Jquery Library To validate SQL statment?


I've done research and I found out some ways to do it:

Using existing node.js libraries

I did a Google search and I didn't found a consensual and popular library to use. I found those ones:

  • simple-sql-parser (22 stars on github, 16 daily download on npm)
    • Supports only SELECT, INSERT, UPDATE and DELETE
    • There is a v2 branch on the road
  • sql-parser (90 stars on github, 6 daily downloads on npm)
    • Only supports basic SELECT statements
    • Based on jison
  • sqljs (17 stars on github, 5 daily downloads on npm)
    • v0.0.0-3, under development... No documentation at all :)

Unfortunately, none of those libraries seams to be complete and trustful.

Doing it myself based on a node.js low level tokenizer library

I can do it my self with a low level tokenizer library like:

Doing it myself based on existing Javascript code beautifier

CodeMirror is a pretty cool Javascript library (browser side) that can recognize SQL keywords, strings, etc. Check ou the demo.

I can build a node.js library tokenizer based on CodeMirror. The SQL mode is here on github, I can maybe adapt it to get tokens within a node application.

PS: CodeMirror have 5,046 stars on github and is well maintained.


I figured out that there are 2 distinct problems: Tokenization and Syntax validation (which is related to tokenization).

I made myself a SQL tokenizer for Node.js based on the SQL mode of the excellent CodeMirror (5,046 stars on github, well maintained). CodeMirror's SQL mode take in charge "generic" SQL and some SQL particularities like MSSQL, MySQL, PL/SQL, Cassandra, Hive and MariaDB.

When my project will be mature enough, I will (probably) put it public on GitHub and let you know.

About the SQL syntax validation, I found no JavaScript tool (or open source project to adapt in JS) yet...

Sick answered 6/8, 2014 at 9:16 Comment(1)
the links to the simple-sql-parser do not go where you would expect....Whalebone
V
2

I recently published the sql-ast package, which parses SQL scripts and returns an array of AST nodes. (I couldn't find anything similar that's maintained)

It's very limited at the moment. I've only implemented what I need from it. If you have time, please contribute. I've tried hard to make the codebase understandable.

Give it a star if you're interested in seeing it developed further. I will update this answer as the library is improved.

Vestibule answered 11/8, 2018 at 0:32 Comment(0)
I
0

Good luck.

You didn't mention which SQL, but most of the production SQL languages are huge (check out PL/SQL ignoring the Ada part), complicated and not the same so you'll have to worry about dialect variants, too. You are facing building a full SQL front end to do what you want; the incompleteness of other parsers you found is a hint about the level of effort it takes to do this.

After you get the parser part right, then you'll have to do a symbol table and type analysis (meaning of every symbol) before you can begin to find out what a SQL query reads or writes (consider determining columns read by SELECT * from .. ...).

I understand there are commercial SQL parsers out there. You might consider using one of those.

Interviewer answered 6/8, 2014 at 10:51 Comment(5)
I "only" need the most common SQL (ANSI/ISO) like described on w3schools. And yeah I know SQL parsing is not and easy thing. If I found any UPDATE, DELETE, INSERT, DROP, etc. I can consider the statement will modify.Sick
I would strongly recommend contributing to one of the existing tools rather than starting from scratch. Also, you didn't really cover why you need this. Could sequelize or another node orm do the job you need?Diagnostician
Why don't you let a DBMS do the job? Mysql-Explain makes a syntax check even for sub querys and vor postgres there seems to be this: github.com/markdrago/pgsanity. Beside that: just grep for UPDATE, DELETE, INSERT, DROP and you might have a 99% hit rate.Hirsch
@CFrei: If all OP wants to do is decide if "read or write", looking at the prefix of the query for UPDATE, DELETE, INSERT, DROP, ... would tell him that, yes. Usually people that want to process SQL want to know precisely what tables and columns are affected, and how.Interviewer
@Hirsch it is useful for devops things to be able to parse in code without having a sql database availableWhalebone
A
0

If you want to develop your own SQL parser, I'll recommend a PEG design parser. I've used a PEG parser for a compile-to-js/compile-to-c language, and it resulted into a much clear and easy to mantain code. Check: https://github.com/luciotato/LiteScript

You can start from LiteScript parser if: a) this parser is a important part of your application, b) eventually you'll need native-exe-speed parsing.

But if this is not a important part of the application you're developing, contributing to a existent specific sql parser could be the best choice.

Acroterion answered 16/8, 2014 at 20:17 Comment(1)
Doesn't look like it uses PEG anymore :)Vestibule
B
0

You can look at the SQLite/WebSQL JavaScript parser and Jison grammar file, which can be used for verification tool.

Now it supports full SQLite/WebSQL syntax, and can be modified for other SQL syntax.

Byrnie answered 16/2, 2015 at 18:58 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.