I'm doing a "select sum(foo) from bar" query on a MySQL database that's summing up 7.3mm records and taking about 22 seconds per run. Is there a trick to speeding up sums in MySQL?
No, you can't speed up the function itself. The problem here is really that you're selecting 7.3 million records. MySQL has to scan the entire table, and 7.3 million is a pretty big number. I'm impressed that it finishes that fast, actually.
A strategy you could employ would be to break your data into smaller subsets (perhaps by date? Month?) and maintain a total sum for old data that's not going to change. You could periodically update the sum, and the overall value could be calculated by adding the sum, and any new data that's been added since then, which will be a much smaller number of rows.
Turn on QUERY CACHE in mysql. Caching is OFF by default. You need to set mysql ini file.
-- hint mysql server about caching
SELECT SQL_CACHE sum(foo) FROM bar;
MySQL optimizer may be able to return a cache if no changes were made to the table.
Read more here: http://www.mysqlperformanceblog.com/2006/07/27/mysql-query-cache/
foo
gets updated? Or will MySQL just re-sum the entire table again next time this query is run? –
Retail Two things here:
1) You should not do sum for 7.3m records on regular basis - introduce staging tables serving business needs (by day, month, year, department, etc.) and fill them on scheduled basis, possibly reuse those tables instead of original 'raw' table (like select summarized value for each day when you need few days interval, etc.)
2) check your transaction settings
http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html#isolevel_repeatable-read
No, not really. It will always have to enumerate all the rows in the table.
You could create a additional table and update the sum in there on every insert, update, delete?
You can probably try adding an index on bar.foo field. The index will contain all values of bar column, but is smaller thus quicker to scan than the original foo table, especially if foo has a lot of other columns.
If your query is really that simple, no... but if your are using a more complex query (and abbreviated it here) you could (probably) - like using better joins...
I had the same problem, and thought I would run these in parallel to solve it (either using async queries or multithreading):
SELECT SUM(foo) FROM BAR WHERE id <= 999999;
SELECT SUM(foo) FROM BAR WHERE id >= 1000000 AND id <= 1999999;
SELECT SUM(foo) FROM BAR WHERE id >= 2000000 AND id <= 2999999;
SELECT SUM(foo) FROM BAR WHERE id >= 3000000 AND id <= 3999999;
SELECT SUM(foo) FROM BAR WHERE id >= 4000000 AND id <= 4999999;
SELECT SUM(foo) FROM BAR WHERE id >= 5000000 AND id <= 5999999;
SELECT SUM(foo) FROM BAR WHERE id >= 6000000 AND id <= 6999999;
SELECT SUM(foo) FROM BAR WHERE id >= 7000000;
and then just SUM then in parallel. But to my surprise, I didn't have to, because Mysql did 1-million-at-a-time much faster than doing the entire table: it was much faster to run 10x of these 1-million-queries than to run 1 query summing everything (no idea why)
fwiw that code was generated with
for i in range(0, 7000001, 1000000):
query = f"SELECT SUM(foo) FROM BAR WHERE id >= {i} AND id <= {i + 1000000 - 1};"
print(query)
iirc that was MySQL 5.6.
But yeah, see if doing 1 million at a time gives acceptable performance, and if not, do them simultaneously with multithreading/async.
© 2022 - 2025 — McMap. All rights reserved.