split sql statements in php on semicolons (but not inside quotes)
Asked Answered
B

2

6

I have a system that is causing errors when users use a semicolon in a free format field. I have traced it down to a simple explode statement:

$array = explode( ";", $sql );

Because this line is in a subroutine that is called from all over the system I would like to replace this line with something that will split things properly, without breaking the rest of the system. I thought I was onto a winner with str_getcsv, but that isn't sophisticated enough either. Look at the following example

$sql = "BEGIN;INSERT INTO TABLE_A (a, b, c) VALUES('42', '12', '\'ab\'c; DEF');INSERT INTO TABLE_B (d, e, f) VALUES('42', '43', 'XY\'s Z ;uvw') ON DUPLICATE KEY UPDATE f='XY\'s Z ;uvw';COMMIT;";

$array = str_getcsv($sql, ";", "'");
foreach( $array as $value ) {
    echo $value . "<br><br>";
}

When I run this is outputs the following:

BEGIN

INSERT INTO TABLE_A (a, b, c) VALUES('42', '12', '\'ab\'c

DEF')

INSERT INTO TABLE_B (d, e, f) VALUES('42', '43', 'XY\'s Z

uvw') ON DUPLICATE KEY UPDATE f='XY\'s Z

uvw'

COMMIT

So it doesn't notice the semicolons are inside quotes. (As far as I can see the quoted strings from different places in the system are always in single quotes, but it is possible that at times they are double quotes, I am not sure about that.)

Can anyone tell me how to do this? I suspect I can do this with a very complicated regex, but this is over my head.

Blastocoel answered 26/6, 2014 at 5:50 Comment(1)
Hey there, following up on this. Did one of the answers solve it for you, or is the question still there? Please give us some feedback. :)Slot
S
8

(*SKIP)(*FAIL) Magic

This live PHP demo shows you the output of the two options below (with or without the semi-colon).

This is what you need:

$splits = preg_split('~\([^)]*\)(*SKIP)(*F)|;~', $sql);

See demo to see that we are splitting on the right semi-colons.

Output:

[0] => BEGIN
[1] => INSERT INTO TABLE_A (a, b, c) VALUES('42', '12', '\'ab\'c; DEF')
[2] => INSERT INTO TABLE_B (d, e, f) VALUES('42', '43', 'XY\'s Z ;uvw')
[3] => COMMIT
[4] =>

The empty item #4 is the match on the other side of the final ;. The other option is to keep the semi-colons (see below).

Option 2: Keep the Semi-Colons

If you want to keep the semi-colons, go with this:

$splits = preg_split('~\([^)]*\)(*SKIP)(*F)|(?<=;)(?![ ]*$)~', $sql);

Output:

[0] => BEGIN;
[1] => INSERT INTO TABLE_A (a, b, c) VALUES('42', '12', '\'ab\'c; DEF');
[2] => INSERT INTO TABLE_B (d, e, f) VALUES('42', '43', 'XY\'s Z ;uvw');
[3] => COMMIT;

Explanation

This problem is a classic case of the technique explained in this question to "regex-match a pattern, excluding..."

In left side of the alternation |, the regex \([^)]*\) matches complete (parentheses) then deliberately fails, after which the engine skips to the next position in the string. The right side matches the ; word you want, and we know they are the right ones because they were not matched by the expression on the left. It is now safe to split on it.

In Option 2, where we keep the semi-colons, our match on the right matches a position, but no characters. That position is asserted by the lookbehind (?<=;), which asserts that a ; immediately precedes the position, and the negative lookahead (?![ ]*$), which asserts that what follows is not optional spaces then the end of the string (so we avoid a last empty match).

Sample Code

Please examine the live PHP demo.

Reference

Slot answered 26/6, 2014 at 5:58 Comment(12)
This is just what I am looking for. It works a treat. Thanks.Will need to watch your live php demo sometime soon.Blastocoel
Just came across an other case where this doesn't work, namely when an SQL statement does have an 'ON DUPLICATE' section, in which case the arguments are not in brackets. Have updated the question to reflect this.Blastocoel
OK, I worked it out, the links you provided prove very helpful. If I change the regex to '~([^)]*)(*SKIP)(*F)|\'(?:\\\'|[^\'])*\'(*SKIP)(*F)|;~' it works. Maybe you can update it in your answer for other people looking at this answer.Blastocoel
OK,, hopefully the last update. The above regex works in regex101.com, but in actual php code it should have two extra slashes: '~([^)]*)(*SKIP)(*F)|\'(?:\\\\\'|[^\'])*\'(*SKIP)(*F)|;~'Blastocoel
For anyone else reading this who is not familiar with the (*SKIP)(*FAIL) dark magic, you can read about it in the PCRE documentation.Tribade
@EdCottrell Thanks, Ed! And even more directly, see Special Backtracking Control Verbs in the perl doc. Added to the references, great idea, thank you. :)Slot
@Blastocoel Normally in PHP code, you're going to need way fewer backslashes. Regex101 forces escaping on / because they use it as a delimiter (lame) :) You can see on the actual php demo that the amount of escaping is minimal, right? I'm not able to look at backslash soup (can't focus) but if you give me your raw regex I'll tell you if anything needs escaping. :)Slot
My actual test script looks like this, and without the five slashes (which I don't like either) it doesn't work for me: $sql = "BEGIN;INSERT INTO TABLE_A (a, b, c) VALUES('42', '12', '\'ab\'c; DEF') ON DUPLICATE KEY UPDATE c='\'ab\'c; DEF';INSERT INTO TABLE_B (d, e, f) VALUES('42', '43', 'XY\'s Z ;uvw') ON DUPLICATE KEY UPDATE f='XY\'s Z ;uvw';COMMIT;"; $array = preg_split('~([^)]*)(*SKIP)(*F)|\'(?:\\\\\'|[^\'])*\'(*SKIP)(*F)|;~', $sql); echo $sql . "<br><br>"; foreach( $array as $value ) { echo $value . "<br><br>"; }Blastocoel
@Slot What about statements that include delimiters? I've been trying to fork your regex to account for that, but I'm having difficulty implementing it with my novice experience with variation (*SKIP)(*F) expressions. DELIMITER $$ SELECT NOW()$$ DELIMITER ; -- etc.Udela
This works most of the time, but fails with SQL comments containing either ; or ', which is present in e.g. SQL output from pgAdmin - for the most part, stripping lines starting with -- before splitting will help, but things like comments on the same line, or block comments, will fail.Sewellyn
the first regex from the answer fails on: "create table test ( id bigint auto_increment primary key, test varchar(255) null comment 'NULL = all; otherwise use test names' )" - splits on a semicolon inside comment ideone.com/bb4albKherson
In PHP 7.4 this splits inside of quotes: phpliveregex.com/p/Lf2#tab-preg-replaceFrightened
C
1

The answer @zx81 works fine with simple queries, but not with complex ones. For example, the script breaks down on separating queries containing functions or triggers, so I decided to publish my solution that will work with the syntax of most popular RDBMS. The solution doesn't look quite pretty, but it covers a lot of SQL dialect constructs.

This solution is used in projects like adminer.php (where I borrowed the code and modified it) and sqldaddy.io

    function splitSqlQueriesIntoArray(string $driver, string $query): array
    {

        $array_queries = [];
        $space = "(?:\\s|/\\*[\s\S]*?\\*/|(?:#|-- )[^\n]*\n?|--\r?\n)";
        $delimiter = ";";
        $offset = 0;
        $fp = false;
        $parse = '[\'"' . ($driver == "sql" ? '`#' : ($driver == "sqlite" ? '`[' : ($driver == "sqlsrv" ? '[' : ''))) . ']|/\*|-- |$' . ($driver == "pgsql" ? '|\$[^$]*\$' : '');

        while ($query != "") {
            if (!$offset && preg_match("~^$space*+DELIMITER\\s+(\\S+)~i", $query, $match)) {
                $delimiter = $match[1];
                $query = substr($query, strlen($match[0]));
            } else {
                preg_match('(' . preg_quote($delimiter) . "\\s*|$parse)", $query, $match, PREG_OFFSET_CAPTURE, $offset); // should always match
                list($found, $pos) = $match[0];
                if (!$found && $fp && !feof($fp)) {
                    $query .= fread($fp, 1e5);
                } else {
                    if (!$found && rtrim($query) == "") {
                        break;
                    }
                    $offset = $pos + strlen($found);
                    if ($found && rtrim($found) != $delimiter) { // find matching quote or comment end
                        while (preg_match('(' . ($found == '/*' ? '\*/' : ($found == '[' ? ']' : (preg_match('~^-- |^#~', $found) ? "\n" : preg_quote($found) . "|\\\\."))) . '|$)s', $query, $match, PREG_OFFSET_CAPTURE, $offset)) { //! respect sql_mode NO_BACKSLASH_ESCAPES
                            $s = $match[0][0];
                            if (!$s && $fp && !feof($fp)) {
                                $query .= fread($fp, 1e5);
                            } else {
                                $offset = $match[0][1] + strlen($s);
                                if ($s[0] != "\\") {
                                    break;
                                }
                            }
                        }

                    } else { // end of a query
                        $array_queries[] = substr($query, 0, $pos);
                        $query = substr($query, $offset);
                        $offset = 0;
                    }
                }
            }
        }
        return  $array_queries;
    } 
Chemurgy answered 13/2, 2023 at 17:36 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.