MySQL solution for 1 million clicks / day [closed]
Asked Answered
M

2

8

We are running a custom OpenX ad server on a MySQL database which gets approx. 1 million clicks / day. We need to store all this click information and show statistics based on it.

Right now, all the click information is aggregated every 2 days and the specific click info is deleted. But we want to provide a our affiliates with a new feature which will allow them to set a dynamic tracking id (TID) and, basically, track their clicks and conversions based on this.

So, the problem is that our click table will grow by a minimum of 1 million entries a day, and we need to be able to search this table and show all the clicks for one user for a specific period of time, grouped by the TID I mentioned above, or search by the TID.

I had a look at MySQL partitioning and it seems like a good solution, but, I'm not sure if it will still work well on a HUGE database (maybe billions of entries).

What do you think would be the correct approach for this issue?

EDIT:

Based on your answers, I'm now thinking of a mixed solution.

We already have a "LIVE" table from which the entries are deleted when the clicks are aggregated at maintenance time, which looks something like this:

Table: clicks

viewer_id | ... | date_time | affiliate_id | ... | tid

(I skipped the columns which are unimportant at this point)

At maintenance time, I can move everything to another monthly table which looks almost the same, say Table: clicks_2012_11, which has indexes for date_time, affiliate_id and tid and is partitioned by the affiliate_id.

So now, when an affiliate wants to see his statistics for the past 2 months, I know I have to look inside the Table: clicks_2012_10 and the Table: clicks_2012_11 (I will have the time range limited to a maximum of 2 months). Because I have the tables partitioned by affiliate_id, only the needed partitions will be searched from the 2 tables and I can now list all the TIDs which had any activity in the past 2 months.

What do you think about this approach? Are there any obvious issues? Am I over complicating things without a solid reason?

Manolete answered 29/10, 2012 at 11:40 Comment(0)
P
2

There is nothing inherent in big (even "huge") tables that makes MySQL fail. Big tables are mostly a problem in terms of:

  • disk space
  • cache usage (you are likely not to be able to run in memory)
  • maintenance (schema changes, rebuilds, ...)

You need to address all of these.

Partitioning is mainly useful for bulk data maintenance such as dropping entire partitions. It is certainly not a best-practice to partition big tables by default on just some column. Partitioning is always introduced for a specific reason.

Primaveria answered 29/10, 2012 at 14:11 Comment(2)
Thanks for your input. I'm thinking about partitioning the table by the affiliate_id since this affiliate_id will be present in all the WHERE clauses for all the queries. When I try to get all the stats for the last 2 months for a particular affiliate id, wouldn't that help in speeding up the queries? What would be the downsize of this approach?Manolete
You don't need partitioning for that. Cluster the table on affiliate_id, date_time desc.Primaveria
S
1

Optimizing for insertion and optimizing for retrieval are usually mutually exclusive. You might be better off with two tables:

live data: no (or minimal) keys, myisam to remove transaction overhead, etc...
historical data: indexed up the wazoo, with data moved over from the live data on a periodic basis.
Simaroubaceous answered 29/10, 2012 at 14:16 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.