I have a very large 500 million rows table with the following columns:
id
-Bigint
- Autoincrementing primary index.date
-Datetime
- Approximately 1.5 million rows per date, data older 1 year is deleted.uid
-VARCHAR(60)
- A user IDsessionNumber
-INT
start
-INT
- epoch of start time.end
-INT
- epoch of end time.- More columns not relevant for this query.
The combination of uid
and sessionNumber
forms a uinque index. I also have an index on date.
Due to the sheer size, I'd like to partition the table.
Most of my accesses would be by date, so partitioning by date ranges seems intuitive, but as the date is not part of the unique index, this is not an option.
Option 1: RANGE PARTITION
on Date and BEFORE INSERT TRIGGER
I don't really have a regular issue with the uid
and sessionNumber
uniqueness being violated. The source data is consistent, but sessions that span two days may be inserted on two consecutive days with midnight being the end time of the first and start time of the second.
I'm trying to understand if I could remove the unique key and instead use a trigger that would
- Check if there is a session with the same identifiers the previous day and if so,
- Updates the end date.
- cancels the actual insert.
However, I am not sure if I can 1) trigger an update on the same table. or 2) prevent the actual insert.
Option 2: LINEAR HASH PARTITION
on UID
My second option is to use a linear hash partition on the UID. However I cannot see any example that utilizes a VARCHAR and converts it to an INTEGER which is used for the HASH
partitioning.
However I cannot finde a permitted way to convert from VARCHAR to INTEGER. For example
ALTER TABLE mytable
PARTITION BY HASH (CAST(md5(uid) AS UNSIGNED integer))
PARTITIONS 20
returns that the partition function is not allowed.
SELECT ... FROM v WHERE uid = '123' order by session_number asc
– Trangtranquada