Is it possible to speed up a sum() in MySQL?
Asked Answered
C

7

25

I'm doing a "select sum(foo) from bar" query on a MySQL database that's summing up 7.3mm records and taking about 22 seconds per run. Is there a trick to speeding up sums in MySQL?

Commensal answered 1/2, 2010 at 16:55 Comment(1)
It is possible in other databases though; ex. Oracle, MS SQL,Retail
C
43

No, you can't speed up the function itself. The problem here is really that you're selecting 7.3 million records. MySQL has to scan the entire table, and 7.3 million is a pretty big number. I'm impressed that it finishes that fast, actually.

A strategy you could employ would be to break your data into smaller subsets (perhaps by date? Month?) and maintain a total sum for old data that's not going to change. You could periodically update the sum, and the overall value could be calculated by adding the sum, and any new data that's been added since then, which will be a much smaller number of rows.

Cerebritis answered 1/2, 2010 at 16:59 Comment(0)
M
12

Turn on QUERY CACHE in mysql. Caching is OFF by default. You need to set mysql ini file.

-- hint mysql server about caching
SELECT SQL_CACHE sum(foo) FROM bar;

MySQL optimizer may be able to return a cache if no changes were made to the table.

Read more here: http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/

Maceio answered 1/2, 2010 at 17:10 Comment(3)
Will the cached-value get updated when a value of foo gets updated? Or will MySQL just re-sum the entire table again next time this query is run?Retail
@BlueRaja-DannyPflughoeft the cache will be cleared and it will re-sum the whole table.Rubbish
The query cache is deprecated as of MySQL 5.7.20, and is removed in MySQL 8.0. dev.mysql.com/doc/refman/5.7/en/query-cache-configuration.htmlCaulis
T
11

Two things here:

1) You should not do sum for 7.3m records on regular basis - introduce staging tables serving business needs (by day, month, year, department, etc.) and fill them on scheduled basis, possibly reuse those tables instead of original 'raw' table (like select summarized value for each day when you need few days interval, etc.)

2) check your transaction settings

http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html#isolevel_repeatable-read

Tractile answered 1/2, 2010 at 19:2 Comment(0)
T
3

No, not really. It will always have to enumerate all the rows in the table.

You could create a additional table and update the sum in there on every insert, update, delete?

Thill answered 1/2, 2010 at 16:58 Comment(0)
A
1

You can probably try adding an index on bar.foo field. The index will contain all values of bar column, but is smaller thus quicker to scan than the original foo table, especially if foo has a lot of other columns.

Amador answered 1/2, 2010 at 17:14 Comment(2)
fixed mistake in referring to the column. should be bar.foo, not foo.bar. thanks to @harry-bAmador
Indexes on 'foo' don't help with aggregations on 'foo'. An index helps you quickly find mammals for example in a table of animals. If you're trying to sum the count of each type of animal, eg. 7 giraffes, 2 donkeys, 4 rhinos, you still have to count every row of animals. You can't simply say SUM('mammals') and have it instantly and magically cumulatively know what the count of all animals in that subset of data, it still has to loop through each animal row unless you change your architecture to pre-aggregate the sum of all mammals in another table or something.Polyandrist
E
0

If your query is really that simple, no... but if your are using a more complex query (and abbreviated it here) you could (probably) - like using better joins...

Eugine answered 1/2, 2010 at 17:0 Comment(1)
May-be this answer is unrelated to exact question asked, but really often culprit for performance problems are JOINS. You should always keep in mind, that MySQL don't have any magic mechanism to optimize joins!Fenella
A
0

I had the same problem, and thought I would run these in parallel to solve it (either using async queries or multithreading):

SELECT SUM(foo) FROM BAR WHERE id <= 999999;
SELECT SUM(foo) FROM BAR WHERE id >= 1000000 AND id <= 1999999;
SELECT SUM(foo) FROM BAR WHERE id >= 2000000 AND id <= 2999999;
SELECT SUM(foo) FROM BAR WHERE id >= 3000000 AND id <= 3999999;
SELECT SUM(foo) FROM BAR WHERE id >= 4000000 AND id <= 4999999;
SELECT SUM(foo) FROM BAR WHERE id >= 5000000 AND id <= 5999999;
SELECT SUM(foo) FROM BAR WHERE id >= 6000000 AND id <= 6999999;
SELECT SUM(foo) FROM BAR WHERE id >= 7000000;

and then just SUM then in parallel. But to my surprise, I didn't have to, because Mysql did 1-million-at-a-time much faster than doing the entire table: it was much faster to run 10x of these 1-million-queries than to run 1 query summing everything (no idea why)

fwiw that code was generated with

for i in range(0, 7000001, 1000000):
    query = f"SELECT SUM(foo) FROM BAR WHERE id >= {i} AND id <= {i + 1000000 - 1};"
    print(query)

iirc that was MySQL 5.6.

But yeah, see if doing 1 million at a time gives acceptable performance, and if not, do them simultaneously with multithreading/async.

Altimeter answered 20/11, 2023 at 23:48 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.