use mysql SUM() in a WHERE clause
Asked Answered
C

4

63

suppose I have this table

id | cash 
1    200
2    301
3    101
4    700

and I want to return the first row in which the sum of all the previous cash is greater than a certain value:

So for instance, if I want to return the first row in which the sum of all the previous cash is greater than 500, is should return to row 3

How do I do this using mysql statement?

using WHERE SUM(cash) > 500 doesn't work

Cordle answered 19/7, 2010 at 19:40 Comment(3)
You want to select id=3 because 200 + 300 >= 500 or because 501 > 500?Nunhood
Are you including an ORDER BY clause? You can't really say get the "next" record because the rows aren't stored in any particular order.Jupon
it's because 200+301 >= 500... yea i tried the order by and having clauses as well but in this scenario they would return row 4 because row 4 is >= 500, not row 3 as intendedCordle
A
131

You can only use aggregates for comparison in the HAVING clause:

GROUP BY ...
  HAVING SUM(cash) > 500

The HAVING clause requires you to define a GROUP BY clause.

To get the first row where the sum of all the previous cash is greater than a certain value, use:

SELECT y.id, y.cash
  FROM (SELECT t.id,
               t.cash,
               (SELECT SUM(x.cash)
                  FROM TABLE x
                 WHERE x.id <= t.id) AS running_total
         FROM TABLE t
     ORDER BY t.id) y
 WHERE y.running_total > 500
ORDER BY y.id
   LIMIT 1

Because the aggregate function occurs in a subquery, the column alias for it can be referenced in the WHERE clause.

Appealing answered 19/7, 2010 at 19:41 Comment(5)
I think his edit clarified what he meant and made this no longer trueKaolin
@Michael Mrozek: Thx for the heads up, think I got it.Appealing
hi there, thanks for the help I'm wondering on whether or not this code would perform well on a large table though Any comments on the performance costs of using this approach on a large database?Cordle
@user380714: You could add indexes if there was a need, but being a calculated column there's no way to specify an index for the running_total column. And the more the data can be limited - say a particular account - the better.Appealing
+1 for using a derived table. I had a similar problem using HAVING on a normal grouped select statement with a counter and it seemed to create a temp table before applying the having. On a table with millions of rows, without the HAVING condition (but a limit 10 for testing) the data would be back in half a sec, with the HAVING it was 2 mins. A derived table as shown above solved my problem, and the query over the millions of rows now takes 0.4sec.Cup
Q
7

Not tested, but I think this will be close?

SELECT m1.id
FROM mytable m1
INNER JOIN mytable m2 ON m1.id < m2.id
GROUP BY m1.id
HAVING SUM(m1.cash) > 500
ORDER BY m1.id
LIMIT 1,2

The idea is to SUM up all the previous rows, get only the ones where the sum of the previous rows is > 500, then skip one and return the next one.

Quandary answered 19/7, 2010 at 19:51 Comment(1)
This does not work as it returns 2. Using SELECT m2.id instead of SELECT m1.id returns 3 - the expected output. But what if the ids are not in order. Should using ORDER BY m2.id instead of ORDER BY m1.id be the solution ? But the ORDER BY clause makes the ordering while showing the output, right ? So can you explain how the ids can be sorted in ASC order when the JOIN relation works instead of when the result is being presented ?Hollister
F
6

In general, a condition in the WHERE clause of an SQL query can reference only a single row. The context of a WHERE clause is evaluated before any order has been defined by an ORDER BY clause, and there is no implicit order to an RDBMS table.

You can use a derived table to join each row to the group of rows with a lesser id value, and produce the sum of each sum group. Then test where the sum meets your criterion.

CREATE TABLE MyTable ( id INT PRIMARY KEY, cash INT );

INSERT INTO MyTable (id, cash) VALUES
  (1, 200), (2, 301), (3, 101), (4, 700);

SELECT s.*
FROM (
  SELECT t.id, SUM(prev.cash) AS cash_sum
  FROM MyTable t JOIN MyTable prev ON (t.id > prev.id)
  GROUP BY t.id) AS s
WHERE s.cash_sum >= 500
ORDER BY s.id
LIMIT 1;

Output:

+----+----------+
| id | cash_sum |
+----+----------+
|  3 |      501 |
+----+----------+
Finalist answered 19/7, 2010 at 19:54 Comment(8)
shows error : '#1052 - Column 'cash' in field list is ambiguous'Hollister
@IstiaqueAhmed, thanks for catching that. I added a correlation name so it's SUM(prev.cash) now. That should fix the error.Finalist
And now it shows id to be 2 instead of 3 and cash_sum 501.00.Hollister
@IstiaqueAhmed, I changed >= to > in the join and now it identifies id 3 as the id of the row where previous rows totaled more than 500.Finalist
Can you explain what GROUP BY t.id does here ? As t.id is always distinct , what does grouping do here ?Hollister
@IstiaqueAhmed, t.id is always distinct in the table, but it is joined to multiple rows based on the inequality in the join condition. So there are multiple rows per grouping.Finalist
'there is no implicit order to an RDBMS table' - without ASC or DESC, the SELECT query will output rows in the order they appear in the DB, right ?Hollister
@IstiaqueAhmed, yes, the SQL standard says that the implementation is free to return rows in any order if the user's query does not specify the order. In practice, this tends to be whatever order is less work for the RDBMS, and that's the order in which the RDBMS read the rows. In the case of InnoDB for example, it's the order of the index from which MySQL read the rows.Finalist
J
1

When using aggregate functions to filter, you must use a HAVING statement.

SELECT *
FROM tblMoney
HAVING Sum(CASH) > 500
Jupon answered 19/7, 2010 at 19:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.