GROUP BY query optimization
Asked Answered
G

6

8

Database is MySQL with MyISAM engine.

Table definition:

CREATE TABLE IF NOT EXISTS  matches  (
   id  int(11) NOT NULL AUTO_INCREMENT,
   game  int(11) NOT NULL,
   user  int(11) NOT NULL,
   opponent  int(11) NOT NULL,
   tournament  int(11) NOT NULL,
   score  int(11) NOT NULL,
   finish  tinyint(4) NOT NULL,
  PRIMARY KEY ( id ),
  KEY  game  ( game ),
  KEY  user  ( user ),
  KEY  i_gfu ( game , finish , user )
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=3149047 ;

I have set an index on (game, finish, user) but this GROUP BY query still needs 0.4 - 0.6 seconds to run:

SELECT user AS player
     , COUNT( id ) AS times
FROM matches
WHERE finish = 1
  AND game = 19
GROUP BY user
ORDER BY times DESC

The EXPLAIN output:

| id | select_type | table   | type | possible_keys | key   | key_len | 
|  1 |  SIMPLE     | matches |  ref | game,i_gfu    | i_gfu |    5    | 

|  ref        |   rows |   Extra                                      |
| const,const | 155855 | Using where; Using temporary; Using filesort |

Is there any way I can make it faster? The table has about 800K records.


EDIT: I changed COUNT(id) into COUNT(*) and the time dropped to 0.08 - 0.12 seconds. I think I've tried that before making the index and forgot to change it again after.

In the explain output the Using index explains the speeding up:

|   rows |   Extra                                                   |
| 168029 | Using where; Using index; Using temporary; Using filesort |

(Side question: is this dropping of a factor of 5 normal?)

There are about 2000 users, so the final sorting, even if it uses filesort, it doesn't hurt performance. I tried without ORDER BY and it still takes almost same time.

Golub answered 20/5, 2011 at 12:26 Comment(1)
The reason the count(*) has a much faster performance than count(id) is MySQL has a specific optimization for the count(*) case. The count(id) case does a second pass through the data to retrieve the results, where the count(*) uses existing internal row counters. Use count(*) whenever possible.Osugi
J
1

The EXPLAIN verifies the (game, finish, user) index was used in the query. That seems like the best possible index to me. Could it be a hardware issue? What is your system RAM and CPU?

Josephus answered 20/5, 2011 at 12:55 Comment(6)
Memory is 1GB. CPU is (i think) AMD Opteron Quad-core 3.5GHz.Quixotic
I would guess your bottleneck is the RAM. I would suggest bumping that to 4GB.Josephus
4Gb to process table with 900k rows ~30 bytes each? ;) That's not even 30 mbytes;)Haslett
@lucek Your math is correct but OS overhead eats up a lot of RAM these days. Also any other running applications will be consuming RAM. 4GB is pretty much standard these days.Josephus
@lucek and @ic3b3rg: For the record, the table has other fields too. Total size is about 80MB. But the machine is used as a MySQL server only.Quixotic
@ypercube there may be a software-based suggestion here that will speed things up for you. Your table, index and SQL structure seem to be fine to me, so I doubt any tweaks there will help. The suggestion about server variables by @Thomas Jones-Low might help. If nothing seems to help, a few extra GBs of RAM is pretty cheap.Josephus
H
8

Get rid of 'game' key - it's redundant with 'i_gfu'. As 'id' is unique count(id) just returns number of rows in each group, so you can get rid of that and replace it with count(*). Try it that way and paste output of EXPLAIN:

SELECT user AS player, COUNT(*) AS times
FROM matches
WHERE finish = 1
AND game = 19
GROUP BY user
ORDER BY times DESC
Haslett answered 20/5, 2011 at 12:51 Comment(0)
C
2

Eh, tough. Try reordering your index: put the user column first (so make the index (user, finish, game)) as that increases the chance the GROUP BY can use the index. However, in general GROUP BY can only use indexes if you limit the aggregate functions used to MIN and MAX (see http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html and http://dev.mysql.com/doc/refman/5.5/en/loose-index-scan.html). Your order by isn't really helping either.

Carlo answered 20/5, 2011 at 12:53 Comment(4)
I've tried that index and also (user, game, finish) and forcing the use of it but it's even slower.Quixotic
Odd. I get the sense you're not going to be able to do better with the combination of GROUP BY and ORDER BY: you might want to create an explicit aggregate table if that query speed is too slow. The fact that Using filesort shows up indicates that the ORDER BY couldn't be done from any index: maybe try adding the id to the index?Carlo
You mean a (game, finish, user, id) index?Quixotic
Well, I'd have said try that on for size, but if using COUNT(*) helped then that probably won't do much good.Carlo
P
2

One of the shortcomings of this query is that you order by an aggregate. That means that you can't return any rows until the full result set has been generated; no index can exist (for mysql myisam, anyway) to fix that.

You can denormalize your data fairly easily to overcome this, though; You could, for instance, add an insert/update trigger to stick a count value in a summary table, with an index, so that you can start returning rows immediately.

Protozoan answered 1/8, 2011 at 2:0 Comment(0)
J
1

The EXPLAIN verifies the (game, finish, user) index was used in the query. That seems like the best possible index to me. Could it be a hardware issue? What is your system RAM and CPU?

Josephus answered 20/5, 2011 at 12:55 Comment(6)
Memory is 1GB. CPU is (i think) AMD Opteron Quad-core 3.5GHz.Quixotic
I would guess your bottleneck is the RAM. I would suggest bumping that to 4GB.Josephus
4Gb to process table with 900k rows ~30 bytes each? ;) That's not even 30 mbytes;)Haslett
@lucek Your math is correct but OS overhead eats up a lot of RAM these days. Also any other running applications will be consuming RAM. 4GB is pretty much standard these days.Josephus
@lucek and @ic3b3rg: For the record, the table has other fields too. Total size is about 80MB. But the machine is used as a MySQL server only.Quixotic
@ypercube there may be a software-based suggestion here that will speed things up for you. Your table, index and SQL structure seem to be fine to me, so I doubt any tweaks there will help. The suggestion about server variables by @Thomas Jones-Low might help. If nothing seems to help, a few extra GBs of RAM is pretty cheap.Josephus
S
1

I take it that the bulk of the time is spent on extracting and more importantly sorting (twice, including the one skipped by reading the index) 150k rows out of 800k. I doubt you can optimize it much more than it already is.

Stomatic answered 20/5, 2011 at 12:56 Comment(5)
Extracting, yes. Sorting no, it doesn't spend time sorting.Quixotic
That's not what your query plan is suggesting. Nor your query, for that matter. They both say at least one sort is needed. :-)Stomatic
I mean, the time it spends in sorting is very short compared to the time spent on grouping.Quixotic
I can't blame it for doing so, either... it's grouping many many rows (half of your table?) into 150k rows according to your query plan. :-)Stomatic
In point of fact, I'm 99% sure you're wasting your time trying to optimize it: your current three-column index allows to go straight to the jugular, as in fetch the relevant rows and group them as is. They then need to be sorted, which also takes time. I very honestly see thing else that you can do. If anything, I'm actually surprised that the planner decides to use an index at all, since you're retrieving 20% of your table.Stomatic
O
1

As others have noted, you may have reached the limit of your ability to tune the query itself. You should next see what the setting of max_heap_table_size and tmp_table_size variables in your server. The default is 16MB, which may be too small for your table.

Osugi answered 20/5, 2011 at 13:28 Comment(1)
thnx for the advice, both settings are at 64M.Quixotic

© 2022 - 2024 — McMap. All rights reserved.