MySQL solution for 1 million clicks / day [closed]

Asked 29/10, 2012 at 11:40 Answered 29/10, 2012 at 14:16

We are running a custom OpenX ad server on a MySQL database which gets approx. 1 million clicks / day. We need to store all this click information and show statistics based on it.

Right now, all the click information is aggregated every 2 days and the specific click info is deleted. But we want to provide a our affiliates with a new feature which will allow them to set a dynamic tracking id (TID) and, basically, track their clicks and conversions based on this.

So, the problem is that our click table will grow by a minimum of 1 million entries a day, and we need to be able to search this table and show all the clicks for one user for a specific period of time, grouped by the TID I mentioned above, or search by the TID.

I had a look at MySQL partitioning and it seems like a good solution, but, I'm not sure if it will still work well on a HUGE database (maybe billions of entries).

What do you think would be the correct approach for this issue?

EDIT:

Based on your answers, I'm now thinking of a mixed solution.

We already have a "LIVE" table from which the entries are deleted when the clicks are aggregated at maintenance time, which looks something like this:

Table: clicks

viewer_id | ... | date_time | affiliate_id | ... | tid

(I skipped the columns which are unimportant at this point)

At maintenance time, I can move everything to another monthly table which looks almost the same, say Table: clicks_2012_11, which has indexes for date_time, affiliate_id and tid and is partitioned by the affiliate_id.

So now, when an affiliate wants to see his statistics for the past 2 months, I know I have to look inside the Table: clicks_2012_10 and the Table: clicks_2012_11 (I will have the time range limited to a maximum of 2 months). Because I have the tables partitioned by affiliate_id, only the needed partitions will be searched from the 2 tables and I can now list all the TIDs which had any activity in the past 2 months.

What do you think about this approach? Are there any obvious issues? Am I over complicating things without a solid reason?

Manolete answered 29/10, 2012 at 11:40 Comment(0)

There is nothing inherent in big (even "huge") tables that makes MySQL fail. Big tables are mostly a problem in terms of:

disk space
cache usage (you are likely not to be able to run in memory)
maintenance (schema changes, rebuilds, ...)

You need to address all of these.

Partitioning is mainly useful for bulk data maintenance such as dropping entire partitions. It is certainly not a best-practice to partition big tables by default on just some column. Partitioning is always introduced for a specific reason.

Primaveria answered 29/10, 2012 at 14:11 Comment(2)

Thanks for your input. I'm thinking about partitioning the table by the affiliate_id since this affiliate_id will be present in all the WHERE clauses for all the queries. When I try to get all the stats for the last 2 months for a particular affiliate id, wouldn't that help in speeding up the queries? What would be the downsize of this approach? – Manolete 29/10, 2012 at 18:3

You don't need partitioning for that. Cluster the table on affiliate_id, date_time desc. – Primaveria 29/10, 2012 at 23:0

Optimizing for insertion and optimizing for retrieval are usually mutually exclusive. You might be better off with two tables:

live data: no (or minimal) keys, myisam to remove transaction overhead, etc...
historical data: indexed up the wazoo, with data moved over from the live data on a periodic basis.

Simaroubaceous answered 29/10, 2012 at 14:16 Comment(0)

Recommended topics

Hot tags