I'm archiving this web forum, which normally gets purged about once a week. So I'm screen scraping it, and storing it into my database (PostgreSQL).
I also do a little analysis on the data, with some graphs for users to enjoy, like what time of day is the forum most active, and so forth.
So I have a posts table, like so:
Column | Type
------------+------------------------------
id | integer
body | text
created_at | timestamp without time zone
topic_id | integer
user_name | text
user_id | integer
And I now want to have a post count for each user, for my little top 10 posters table.
I came up with this:
SELECT user_id, user_name, count(*)
FROM posts
GROUP BY user_id, user_name
ORDER BY count DESC LIMIT 10
Which turns out to be very slow. 9 seconds, with just about 300 000 rows in the posts table at the moment.
It takes only half a second, if I group on just one column, but I need both.
I'm rather new to relational databases, and SQL, so I'm not quite sure if this is right, or just how am I doing it wrong?
SELECT
list that do not also appear in theGROUP BY
list, as long as those columns are functionally dependent upon theGROUP BY
list. Shameless plug: rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html – Cotto