In my current application, I need to be able to do this type of query:
SELECT MIN((colA, colB, colC))
FROM mytable
WHERE (colA, colB, colC) BETWEEN (200, 'B', 'C') AND (1000, 'E', 'F')
and get the answer of (333, 'B', 'B')
, given this data:
+------+------+------+
| colA | colB | colC |
+------+------+------+
| 99 | A | A |
| 200 | A | Z |
| 200 | B | B |
| 333 | B | B |
| 333 | C | D |
| 333 | C | E |
| 333 | D | C |
| 1000 | E | G |
| 1000 | F | A |
+------+------+------+
What is the most efficient way to accomplish this in real SQL? Please keep in mind that this is a toy example, and that my actual application has tables with varying columns and data types, and hundreds of million of rows. I use MySQL, if that helps. You can also assume that these columns have a PRIMARY or UNIQUE index on them.
If the solution is easily extensible to more/less columns, that's even better.
Tuple Comparison:
Several have asked so I should put this in the question. Tuples are ordered lexicographically, meaning that the sequences are ordered the same as their first differing elements. For example, (1,2,x) < (1,2,y) returns the same as x < y.
It's worth noting that SQL (or at least mysql) implements this correctly:
mysql> select (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F');
+--------------------------------------------------------------------------+
| (200, 'B', 'C') < (333, 'B', 'B') and (333, 'B', 'B') < (1000, 'E', 'F') |
+--------------------------------------------------------------------------+
| 1 |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)
Here's the necessary SQL to create the example:
create table mytable select 333 colA, 'B' colB, 'B' colC;
insert into mytable values (200, 'B', 'B'), (333, 'C', 'D'), (1000, 'E', 'G'),
(200, 'A', 'Z'), (1000, 'F', 'A'), (333, 'C', 'E'), (333, 'D', 'C'),
(99, 'A', 'A');
alter table mytable add unique index myindex (colA, colB, colC);
Adding this index seems to cause the table to be sorted lexicographically, which is interesting. This isn't true in our production system.
('B', 'K', 'K')
would not staisfy theBETWEEN
conditions, right? – Resolution(1,2,x) < (1,2,y)
returns the same asx < y
. – Selfsupport