use mysql SUM() in a WHERE clause

Asked 19/7, 2010 at 19:40 Answered 19/7, 2010 at 19:54

Solved mysql aggregate-functions having-clause

suppose I have this table

and I want to return the first row in which the sum of all the previous cash is greater than a certain value:

So for instance, if I want to return the first row in which the sum of all the previous cash is greater than 500, is should return to row 3

How do I do this using mysql statement?

using WHERE SUM(cash) > 500 doesn't work

Cordle answered 19/7, 2010 at 19:40 Comment(3)

You want to select id=3 because 200 + 300 >= 500 or because 501 > 500? – Nunhood 19/7, 2010 at 19:42

Are you including an ORDER BY clause? You can't really say get the "next" record because the rows aren't stored in any particular order. – Jupon 19/7, 2010 at 19:45

it's because 200+301 >= 500... yea i tried the order by and having clauses as well but in this scenario they would return row 4 because row 4 is >= 500, not row 3 as intended – Cordle 19/7, 2010 at 19:49

131

You can only use aggregates for comparison in the HAVING clause:

GROUP BY ...
  HAVING SUM(cash) > 500

The HAVING clause requires you to define a GROUP BY clause.

To get the first row where the sum of all the previous cash is greater than a certain value, use:

SELECT y.id, y.cash
  FROM (SELECT t.id,
               t.cash,
               (SELECT SUM(x.cash)
                  FROM TABLE x
                 WHERE x.id <= t.id) AS running_total
         FROM TABLE t
     ORDER BY t.id) y
 WHERE y.running_total > 500
ORDER BY y.id
   LIMIT 1

Because the aggregate function occurs in a subquery, the column alias for it can be referenced in the WHERE clause.

Appealing answered 19/7, 2010 at 19:41 Comment(5)

I think his edit clarified what he meant and made this no longer true – Kaolin 19/7, 2010 at 19:43

@Michael Mrozek: Thx for the heads up, think I got it. – Appealing 19/7, 2010 at 19:50

hi there, thanks for the help I'm wondering on whether or not this code would perform well on a large table though Any comments on the performance costs of using this approach on a large database? – Cordle 19/7, 2010 at 19:59

@user380714: You could add indexes if there was a need, but being a calculated column there's no way to specify an index for the running_total column. And the more the data can be limited - say a particular account - the better. – Appealing 19/7, 2010 at 20:2

+1 for using a derived table. I had a similar problem using HAVING on a normal grouped select statement with a counter and it seemed to create a temp table before applying the having. On a table with millions of rows, without the HAVING condition (but a limit 10 for testing) the data would be back in half a sec, with the HAVING it was 2 mins. A derived table as shown above solved my problem, and the query over the millions of rows now takes 0.4sec. – Cup 3/12, 2015 at 14:42

Not tested, but I think this will be close?

SELECT m1.id
FROM mytable m1
INNER JOIN mytable m2 ON m1.id < m2.id
GROUP BY m1.id
HAVING SUM(m1.cash) > 500
ORDER BY m1.id
LIMIT 1,2

The idea is to SUM up all the previous rows, get only the ones where the sum of the previous rows is > 500, then skip one and return the next one.

Quandary answered 19/7, 2010 at 19:51 Comment(1)

This does not work as it returns 2. Using SELECT m2.id instead of SELECT m1.id returns 3 - the expected output. But what if the ids are not in order. Should using ORDER BY m2.id instead of ORDER BY m1.id be the solution ? But the ORDER BY clause makes the ordering while showing the output, right ? So can you explain how the ids can be sorted in ASC order when the JOIN relation works instead of when the result is being presented ? – Hollister 25/10, 2017 at 15:16

In general, a condition in the WHERE clause of an SQL query can reference only a single row. The context of a WHERE clause is evaluated before any order has been defined by an ORDER BY clause, and there is no implicit order to an RDBMS table.

You can use a derived table to join each row to the group of rows with a lesser id value, and produce the sum of each sum group. Then test where the sum meets your criterion.

CREATE TABLE MyTable ( id INT PRIMARY KEY, cash INT );

INSERT INTO MyTable (id, cash) VALUES
  (1, 200), (2, 301), (3, 101), (4, 700);

SELECT s.*
FROM (
  SELECT t.id, SUM(prev.cash) AS cash_sum
  FROM MyTable t JOIN MyTable prev ON (t.id > prev.id)
  GROUP BY t.id) AS s
WHERE s.cash_sum >= 500
ORDER BY s.id
LIMIT 1;

Output:

+----+----------+
| id | cash_sum |
+----+----------+
|  3 |      501 |
+----+----------+

Finalist answered 19/7, 2010 at 19:54 Comment(8)

shows error : '#1052 - Column 'cash' in field list is ambiguous' – Hollister 25/10, 2017 at 15:34

@IstiaqueAhmed, thanks for catching that. I added a correlation name so it's SUM(prev.cash) now. That should fix the error. – Finalist 25/10, 2017 at 15:39

And now it shows id to be 2 instead of 3 and cash_sum 501.00. – Hollister 25/10, 2017 at 15:45

@IstiaqueAhmed, I changed >= to > in the join and now it identifies id 3 as the id of the row where previous rows totaled more than 500. – Finalist 25/10, 2017 at 16:10

Can you explain what GROUP BY t.id does here ? As t.id is always distinct , what does grouping do here ? – Hollister 25/10, 2017 at 17:12

@IstiaqueAhmed, t.id is always distinct in the table, but it is joined to multiple rows based on the inequality in the join condition. So there are multiple rows per grouping. – Finalist 25/10, 2017 at 17:40

'there is no implicit order to an RDBMS table' - without ASC or DESC, the SELECT query will output rows in the order they appear in the DB, right ? – Hollister 26/10, 2017 at 11:2

@IstiaqueAhmed, yes, the SQL standard says that the implementation is free to return rows in any order if the user's query does not specify the order. In practice, this tends to be whatever order is less work for the RDBMS, and that's the order in which the RDBMS read the rows. In the case of InnoDB for example, it's the order of the index from which MySQL read the rows. – Finalist 26/10, 2017 at 14:42

When using aggregate functions to filter, you must use a HAVING statement.

SELECT *
FROM tblMoney
HAVING Sum(CASH) > 500

Jupon answered 19/7, 2010 at 19:42 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags