Perl, DBI, and SQL: Why NOT does not work in some SQL queries?
Asked Answered
T

3

6

I have the weirdest problem and my extremely basic knowledge of SQL must be terribly wrong, but I cannot make sense of the behaviour illustrated below.

I have this file test.csv

id,field
A,0
B,1
C,2
D,"0"
E,"1"
F,"2"
G,
H,""
I," "

And this test code:

#! /usr/bin/perl

use strict;
use warnings;
use DBI;
use Devel::VersionDump qw(dump_versions);

my $dbh = DBI->connect ("dbi:CSV:");
$dbh->{RaiseError} = 1;
$dbh->{TraceLevel} = 0;
my $i = 0;

foreach my $cond ("TRUE",
                  "field <> 0 AND field <> 1",
                  "field = 0 OR field = 1",
                  "NOT (field = 0 OR field = 1)",
                  "NOT field = 0 OR field = 1",
                  "field <> 0",
                  "NOT field <> 0",
                  ) {
  print "Condition #" . $i++ . " is $cond:\n";
  my $sth = $dbh->prepare("SELECT * FROM test.csv WHERE $cond");
  $sth->execute();
  $sth->dump_results();
};

print "\n\n";
dump_versions();

When run, this is the output:

Condition #0 is TRUE:
'A', '0'
'B', '1'
'C', '2'
'D', '0'
'E', '1'
'F', '2'
'G', ''
'H', ''
'I', ' '
9 rows
Condition #1 is field <> 0 AND field <> 1:
'C', '2'
'F', '2'
'G', ''
'H', ''
'I', ' '
5 rows
Condition #2 is field = 0 OR field = 1:
'A', '0'
'B', '1'
'D', '0'
'E', '1'
4 rows
Condition #3 is NOT (field = 0 OR field = 1):
'A', '0'
'B', '1'
'D', '0'
'E', '1'
4 rows
Condition #4 is NOT field = 0 OR field = 1:
'B', '1'
'C', '2'
'E', '1'
'F', '2'
'G', ''
'H', ''
'I', ' '
7 rows
Condition #5 is field <> 0:
'B', '1'
'C', '2'
'E', '1'
'F', '2'
'G', ''
'H', ''
'I', ' '
7 rows
Condition #6 is NOT field <> 0:
'A', '0'
'D', '0'
2 rows


Perl version: v5.16.3 on MSWin32 (C:\Program Files\Perl64\bin\perl.exe)
ActivePerl::Config                                     -  Unknown
ActiveState::Path                                      -     1.01
AutoLoader                                             -     5.73
C:::Program Files::Perl64::site::lib::sitecustomize.pl -  Unknown
Carp                                                   -     1.26
Class::Struct                                          -     0.63
Clone                                                  -     0.34
Config                                                 -  Unknown
Config_git.pl                                          -  Unknown
Config_heavy.pl                                        -  Unknown
Cwd                                                    -     3.40
DBD::CSV                                               -     0.41
DBD::File                                              -     0.42
DBI                                                    -    1.631
DBI::DBD::SqlEngine                                    -     0.06
DBI::SQL::Nano                                         - 1.015544
Data::Dumper                                           -    2.139
Devel::VersionDump                                     -     0.02
DynaLoader                                             -     1.14
Encode                                                 -     2.49
Encode::Alias                                          -     2.16
Encode::Config                                         -     2.05
Encode::Encoding                                       -     2.05
Errno                                                  -     1.15
Exporter                                               -     5.67
Exporter::Heavy                                        -     5.67
Fcntl                                                  -     1.11
File::Basename                                         -     2.84
File::Spec                                             -     3.40
File::Spec::Unix                                       -     3.40
File::Spec::Win32                                      -     3.40
File::stat                                             -     1.05
IO                                                     -  1.25_06
IO::Dir                                                -      1.1
IO::File                                               -     1.16
IO::Handle                                             -     1.33
IO::Seekable                                           -      1.1
List::Util                                             -     1.27
Math::BigFloat                                         -    1.997
Math::BigInt                                           -    1.998
Math::BigInt::Calc                                     -    1.997
Math::Complex                                          -     1.59
Math::Trig                                             -     1.23
Params::Util                                           -     1.07
SQL::Dialects::AnyData                                 -    1.405
SQL::Dialects::Role                                    -    1.405
SQL::Eval                                              -    1.405
SQL::Parser                                            -    1.405
SQL::Statement                                         -    1.405
SQL::Statement::Function                               -    1.405
SQL::Statement::Functions                              -    1.405
SQL::Statement::Operation                              -    1.405
SQL::Statement::Placeholder                            -    1.405
SQL::Statement::RAM                                    -    1.405
SQL::Statement::Term                                   -    1.405
SQL::Statement::TermFactory                            -    1.405
SQL::Statement::Util                                   -    1.405
Scalar::Util                                           -     1.27
SelectSaver                                            -     1.02
Symbol                                                 -     1.07
Text::CSV_XS                                           -     1.07
Tie::Hash                                              -     1.04
Time::HiRes                                            -   1.9725
Win32                                                  -     0.47
XSLoader                                               -     0.16
base                                                   -     2.18
bytes                                                  -     1.04
constant                                               -     1.25
integer                                                -     1.00
overload                                               -     1.18
overloading                                            -     0.02
sort                                                   -     2.01
strict                                                 -     1.07
unicore::Heavy.pl                                      -  Unknown
unicore::lib::Perl::Word.pl                            -  Unknown
unicore::lib::Perl::_PerlIDS.pl                        -  Unknown
utf8                                                   -     1.09
utf8_heavy.pl                                          -  Unknown
vars                                                   -     1.02
warnings                                               -     1.13
warnings::register                                     -     1.02

Condition #0 shows the complete dataset and is fine.

Condition #1 is just some compound condition and works fine.

Condition #2 is the opposite condition (basic logic rules used to invert it), and works fine too.

Yet, condition #3 should be the opposite of #2 and thus equal to #1, but the result is the same as #2: I cannot make any sense of this.

Condition #4 shows that, omitting the parentheses, NOT does work fine, but of course this query is different from any of the previous ones.

Conditions #5 and #6 show a situation where NOT acts exactly as one would expect.

So, why NOT on a compound condition acts as if the NOT were not specified at all?!

By the way, I read this scary post Perl DBD::CSV - SQL Syntax - "AND" clause is not working properly and added Devel::VersionDump to check whether I have a similar issue but it seems to me that all relevant packages are the newest available. Hence, I really have no clue about this.

Tiros answered 6/5, 2014 at 7:41 Comment(0)
G
6

I confirm it's a bug of SQL::Parser:

'where_clause' => HASH(0x7f9686737480)
  'arg1' => HASH(0x7f9686808248)
     'arg1' => HASH(0x7f96866b50f8)
        'fullorg' => 'field'
        'type' => 'column'
        'value' => 'field'
     'arg2' => HASH(0x7f968588dfe0)
        'fullorg' => 0
        'type' => 'number'
        'value' => 0
     'neg' => 0
     'nots' => HASH(0x7f96866b55d8)
          empty hash
     'op' => '='
  'arg2' => HASH(0x7f9684498ce0)
     'arg1' => HASH(0x7f96845fb798)
        'fullorg' => 'field'
        'type' => 'column'
        'value' => 'field'
     'arg2' => HASH(0x7f96866b5158)
        'fullorg' => 1
        'type' => 'number'
        'value' => 1
     'neg' => 0
     'nots' => HASH(0x7f96866b55a8)
          empty hash
     'op' => '='
  'neg' => 0
  'nots' => HASH(0x7f9686808320)
       empty hash
  'op' => 'OR'

The top-most "neg" should be 1. Please open a ticket at https://rt.cpan.org/Dist/Display.html?Name=SQL-Statement - when you refer this thread, the test case is proven :)

Cheers, Jens

Gesner answered 7/5, 2014 at 8:41 Comment(2)
Many thanks, Jens! I had contacted this morning Merijn and he suggested to contact you directly but I had not time yet to put an e-mail together. Little did I know that you were active on this already. Thanks!Tiros
Jens, ticket opened. I was wondering if there is an easy way to describe what kind of logic constructs expose the bug. The point is that I (as perhaps a few other people out there) have a bunch of SQL commands in my program, and now I would like to circumvent the bug by rewriting them in a way they work fine. For instance, it looks like writing NOT a AND NOT b could be safer than NOT (a OR b) even when the second looks more natural. Can you think more than "avoid applying NOT on anything else than a single condition" or, somehow equivalently, "avoid NOT of a parenthesized expression"?Tiros
D
2

SQL logic for DBD::CSV is NOT contained in DBD::CSV, which is just a thin glue layer between Text::CSV_XS and DBI.

All SQL knowledge is dealt with by SQL::Statement. If you think you found a real bug, please try to dig in that module and find the cause, create a patch and post the issue with the patch on RT :)

Dentilingual answered 7/5, 2014 at 6:58 Comment(1)
Understood. Title changed to better reflect the topic.Tiros
H
1

If this indeed happens to be a bug with the distributivity of NOT over parentheses you could fix it quick and dirty by replacing

NOT (A OR B)

with

NOT A AND NOT B

which is equivalent to the former in a logical sense. That probably does not answer the question why your code fails, but if this works and the other does not, than i would assume that to be a bug (or maybe distributivity just is not implemented, no idea what is advertised as supposed to be working and what is not).

Hillard answered 6/5, 2014 at 10:41 Comment(7)
Indeed, the issue is not to find a workaround (Condition #1 would work fine, then) but what is wrong with my understanding of SQL or, if that is the case, with the module I use (but I find it hard to believe that a module which has been around for so long has a bug of this sort...). The problem is that, unless this is truly clarified, I cannot ever count on my SQL code to achieve what I mean--and then I am in deep trouble.Tiros
Well, if one takes the explanation for WHERE on Wikipedia literaly word by word it mentions that predicates may be enclosed in parentheses and that several predicates may be joined by either AND or OR to form new predicates. So that would certainly imply that they work with parens as expected. But neither on Wikipedia nor else did i find an example or mention that NOT is able to negate parens. But you're right, one would certainly expect that to either work or to be warned about on many sites with any tool around for so long. However i can't see a reason why #1 and #3 should differ.Hillard
Did you try NOT field = 0 AND NOT field = 1? I know it should probably be the same as #1 but if it works, than you know for sure that the problem lies within whatever interprets your predicates as it is exactly equivalent to your #3. Otherwise it might still have something to do with it not understanding your entries as integers or whatever.Hillard
Re "can one really use NOT(...)?", I had read some other answer on this site where someone apparently competent in SQL was trying to explain basic logic to a user, and he/she was showing examples exactly like mine. It was not running code, though, and it might even not have been in Perl--cannot locate it anymore now....Tiros
I tried the alternate condition you indicate and the result is as one would expect (that is, identical to Condition #1). Condition #7 is NOT field = 0 AND NOT field = 1: 'C', '2' 'F', '2' 'G', '' 'H', '' 'I', ' ' 5 rowsTiros
Of course one can do NOT(A) in logic. The question is if it is implemented. Since one can easily avoid ever having to use it, it might possibly not be. But one would expect that to be pointed out indeed. However, when it comes to the basic logic, then NOT (A OR B) and NOT A AND NOT B is completely equivalent, so it has to be either a bug or the basic logic is not properly implemented here. And the part that i would suspect for that is certainly the NOT().Hillard
By the way, do not accept my answer. Maybe we are lucky and someone more knowledgeable about this module drops by.Hillard

© 2022 - 2024 — McMap. All rights reserved.