SQL: tuple comparison
Asked Answered
S

2

18

In my current application, I need to be able to do this type of query:

SELECT MIN((colA, colB, colC)) 
FROM mytable
WHERE (colA, colB, colC) BETWEEN (200, 'B', 'C') AND (1000, 'E', 'F')

and get the answer of (333, 'B', 'B'), given this data:

+------+------+------+
| colA | colB | colC |
+------+------+------+
|   99 | A    | A    |
|  200 | A    | Z    |
|  200 | B    | B    |
|  333 | B    | B    |
|  333 | C    | D    |
|  333 | C    | E    |
|  333 | D    | C    |
| 1000 | E    | G    |
| 1000 | F    | A    |
+------+------+------+

What is the most efficient way to accomplish this in real SQL? Please keep in mind that this is a toy example, and that my actual application has tables with varying columns and data types, and hundreds of million of rows. I use MySQL, if that helps. You can also assume that these columns have a PRIMARY or UNIQUE index on them.

If the solution is easily extensible to more/less columns, that's even better.


Tuple Comparison:

Several have asked so I should put this in the question. Tuples are ordered lexicographically, meaning that the sequences are ordered the same as their first differing elements. For example, (1,2,x) < (1,2,y) returns the same as x < y.

It's worth noting that SQL (or at least mysql) implements this correctly:

mysql> select (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F');
+--------------------------------------------------------------------------+
| (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F') |
+--------------------------------------------------------------------------+
|                                                                        1 |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)

Here's the necessary SQL to create the example:

create table mytable select 333 colA, 'B' colB, 'B' colC;
insert into mytable values (200, 'B', 'B'), (333, 'C', 'D'), (1000, 'E', 'G'), 
    (200, 'A', 'Z'), (1000, 'F', 'A'), (333, 'C', 'E'), (333, 'D', 'C'),
    (99, 'A', 'A');
alter table mytable add unique index myindex (colA, colB, colC);

Adding this index seems to cause the table to be sorted lexicographically, which is interesting. This isn't true in our production system.

Selfsupport answered 31/3, 2011 at 21:47 Comment(4)
How do you define tuple ordering?Revile
For the record, a row with ('B', 'K', 'K') would not staisfy the BETWEEN conditions, right?Resolution
@ypercube: yes it would. (B,K,K) sorts between (A,B,C) and (D,E,F)Selfsupport
@John Douthat: Lexicographically, meaning that the sequences are ordered the same as their first differing elements. For example, (1,2,x) < (1,2,y) returns the same as x < y.Selfsupport
S
10

Just do:

SELECT colA
     , colB
     , colC
FROM mytable
WHERE ( ('A',  'B',  'C') <= (colA, colB, colC ) )
  AND ( (colA, colB, colC) <= ('D',  'E',  'F' ) )
ORDER BY colA, colB, colC
LIMIT 1
;

It works just fine. And I suspect is should be pretty fast, too.


This is equivalent but it may have better performance, depending on your tables:

SELECT m.colA
     , m.colB
     , m.colC
FROM mytable m
WHERE ( ('A',  'B',  'C') <= (m.colA, m.colB, m.colC) )
  AND ( (m.colA, m.colB, m.colC) <= ('D',  'E',  'F') )
  AND NOT EXISTS
  ( SELECT 1
    FROM mytable b
    WHERE (b.colA, b.colB, b.colC) < (m. colA, m.colB, m.colC)
      AND ( ('A',  'B',  'C') <= (b.colA, b.colB, b.colC) )
  );
Shoshanashoshanna answered 1/4, 2011 at 0:21 Comment(3)
perfect! You should remove your other answer.Selfsupport
If <= works on tuples, it seems like BETWEEN should work, because it's supposed to simply be equivalent to that operation.Boyt
@Cade: my point exactly, "since BETWEEN did not work, I hadn't tried <= at first".Resolution
S
3

---EDIT---: (Previous wrong trials removed)

2nd try (not really relational algebra).

This works but only when the fields are char(1):

SELECT colA, colB, colC
FROM mytable
WHERE CONCAT(colA, colB, colC)
      BETWEEN CONCAT('A', 'B', 'C')
      AND CONCAT('D', 'E', 'F')
ORDER BY colA, colB, colC
LIMIT 1 ; 

I thought that a view that shows all combinations of tuples from mytable that are less than or equal to tuples of the same table might be helpful, as it can be used for other comparisons:

CREATE VIEW lessORequal AS
( SELECT a.colA AS smallA
       , a.colB AS smallB
       , a.colC AS smallC
       , b.colA AS largeA
       , b.colB AS largeB
       , b.colC AS largeC
  FROM mytable a
    JOIN mytable b
      ON (a.colA < b.colA)
         OR ( (a.colA = b.colA)
               AND ( (a.colB < b.colB)
                     OR (a.colB = b.colB
                        AND a.colC <= b.colC)
                   )
            )
  ) ;

Using similar technique, this solves the question. It works with any kind of fields (int, float, char of any length). It's going to be kind of awkard and complicated though if one tries to add more fields.

SELECT colA, colB, colC
FROM mytable m
WHERE ( ('A' < colA)
        OR ( ('A' = colA)
              AND ( ('B' < colB)
                    OR ('B' = colB
                       AND 'C' <= colC)
                  )
           )
      )
  AND ( (colA < 'D')
         OR ( (colA = 'D')
              AND ( (colB < 'E')
                    OR (colB = 'E'
                       AND colC <= 'F')
                  )
            )
      )
ORDER BY colA, colB, colC
LIMIT 1 ; 

One also define a function:

CREATE FUNCTION IslessORequalThan( lowA CHAR(1)
                                 , lowB CHAR(1)
                                 , lowC CHAR(1)
                                 , highA CHAR(1)
                                 , highB CHAR(1)
                                 , highC CHAR(1)
                                 )
RETURNS boolean
RETURN ( (lowA < highA)
         OR ( (lowA = highA)
               AND ( (lowB < highB)
                     OR ( (lowB = highB)
                          AND (lowC <= highC)
                        )
                   )
            )
       );

and use it to solve the same or similar problems. This solves the question again. The query is elegant but a new function has to be created if the type or number of fields is changed.

SELECT colA
     , colB
     , colC
FROM mytable 
WHERE IslessORequalThan(  'A',  'B',  'C', colA, colB, colC )
  AND IslessORequalThan( colA, colB, colC,  'D',  'E',  'F' )
ORDER BY colA, colB, colC
LIMIT 1;

Until then, and because the condition

(colA, colB, colC) BETWEEN ('A', 'B', 'C') AND ('D', 'E', 'F')

was not allowed in MySQL, I thought that

('A', 'B', 'C') <= (colA, colB, colC)

was not allowed as well. But I was wrong.

Shoshanashoshanna answered 31/3, 2011 at 21:52 Comment(4)
Count all rows that satisfy the same criteria?Resolution
This answer works, but in my experience, OR's are a performance disaster. Do you think there's a better way?Selfsupport
I think you should test it to find out if it's disaster or not. I can't think of anything else at the moment.Resolution
Oh boy, I justtried something in MySQL and it works. I'll post a new answer!Resolution

© 2022 - 2024 — McMap. All rights reserved.