Rebalancing a table shard, with MySQL/InnoDB
Asked Answered
O

2

9

I have a huge InnoDB table (>1TB, >1B rows) that I would like to shard: I would like to make multiple smaller independent tables from that big table.

How to do that ?

What I've already tried:

  • Moving rows to partitions by SELECTing them from the original table, and INSERTing them to the partitions. This takes ages, and keeping rows in sync during the operation is hard (but seems to be doable with triggers, as long as the partition is on the same server). I haven't found a ready to use tool to do that.
  • Copying the entire table and then deleting the rows that do not belong to the partition. Still very slow, especially given the size of the table. This is what MySQL Fabric does, apparently.

Random crazy ideas:

  • Splitting the .idb file offline, and importing it into the server, but I don't know if a tool capable of doing this exists.

SHOW CREATE TABLE:

CREATE TABLE `Huge` (
  `account_id` int(11) NOT NULL,
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `foo` varchar(255) NOT NULL,
  `bar` int(11) NOT NULL,
  `baz` char(2) NOT NULL,
  PRIMARY KEY (`account_id`,`id`),
  UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;

Sharding key would be account_id. The primary key currently is (account_id,id) so that rows are clustered by account_id.

Is there any tool automating this task ? Is there any better way of doing this ?

Octameter answered 26/6, 2015 at 15:1 Comment(5)
Offtopic. Not a programming question. This is database administration, which should be on the DBA site.Pidgin
That table definition does not make sense. Are there other columns? You seem to have many AI ids pointing to each account_id; to what end?Danidania
@RickJames: this table also has lots of columns that are of no interest for this question. About account_id: having it in the primary key is not necessary at all, however adding it at the beginning if the PK gave us a huge performance improvement when selecting many rows for the same account: InnoDB stores rows ordered by primary key on disk, so all rows for a given account_id are grouped on a few pages instead of being spread all over the table. This reduced the number of head moves by approx the number of rows returned by our queries.Octameter
Yes. Arranging the PRIMARY KEY that way is a good idea.Danidania
Fabric has been abandoned by Oracle. Group Replication / InnoDB Cluster replaced it.Danidania
D
10

"Sharding" is splitting data (usually one table) across multiple servers. "Partitioning" is splitting a table into multiple subtables on the same server. Which are you doing? Fabric does sharding. Your comment about .ibd implied you are thinking about partitioning.

Assuming that you want to PARTITION one huge table, I need to first ask why. This is a serious question because most (I do mean most) people think that partitioning will magically produce some benefit, when in reality it won't. I believe there are only 4 use cases for partitioning. Does your situation fall into any of them?

If, on the other hand, you want Sharding, then please provide SHOW CREATE TABLE and discuss which column you would shard on.

Edit (after getting clarity on goals)

I hope you do not have explicit FOREIGN KEYs; they will not work with either partitioning or sharding.

`id` bigint(20) NOT NULL AUTO_INCREMENT,
UNIQUE KEY `id` (`id`)

is a problem in two ways.

  • There is no need to make id UNIQUE; the only requirement for an AUTO_INCREMENT is that it be the first column in some index. So this would be less burden on the system:

    INDEX(id)

  • An AUTO_INCREMENT does work for PARTITIONed table, but it does not work for a sharded table. You will need to assess the purpose of id. Either it is OK for it to be unique only within one shard, then there is no real problem. If id needs to be unique across all shards, that is more difficult. Probably the only solution is to have a technique for getting id(s) from some central server. However, that leads to a single-point-of-failure and a bottleneck. If you need to go that direction, I can advise on how to avoid those drawbacks.

How to migrate to the ultimate, sharded, system?

I do not recommend any digression into PARTITIONing, it won't really help in the long run. And the cost of REORGANIZE PARTITION is high -- copying all the rows over, both the extracted rows and the remaining rows. Edit: If you do use partitions, use pt-online-schema-change to do the splitting with minimal downtime.

Instead, I recommend perfecting a tool that will migrate one account_id from one shard to another. This has to be custom code, because there are probably other tables (and references between tables) that will be affected by moving an account to a different server. In the long run, this will be useful for load balancing, hardware upgrades, software upgrades, and even schema changes. When you need to change something, create a new shard(s) with the new OS/version/schema/whatever and migrate users to it.

The simple approach to this tool is

  1. "Block" writes for that one account
  2. Copy the records to the new shard
  3. Change the gatekeeper to know that that account is now on the new shard
  4. Unblock writes
  5. Eventually (and gradually) DELETE rows on the old shard

This is not much of a problem if an account is "small". But if you need minimal downtime (writes blocked), then we can discuss a more complex approach.

(In case you have not guessed, I have 'been there, done that'.)

Danidania answered 27/6, 2015 at 5:49 Comment(28)
Ultimate goal is sharding. I see partitioning as an intermediate step to achieve sharding. Then I guess my use case for partitioning would be the 4th one - transportable tablespaces. I've added requested information on the question :)Octameter
I added a long discussion on sharding.Danidania
Thanks! For the minimal downtime, do you think I could use triggers to replicate changes to the original table to the shard ? (like pt-online-schema-change is doing) If the shard is remote, could it work with a federated table ? For the ids, I think I can use a combination of auto_increment_offset and auto_increment_increment to make sure that no two servers will generate the same ids.Octameter
Triggers + Federated (for transferring writes during migration) -- seems feasible, but benchmark it. Auto_inc* - but you will have to carefully change the settings whenever you add another shard. Keep in mind that the ids will not be 'ordered' since one shard will be incrementing faster than the others.Danidania
Federated is likely to be slow. An alternative is to save the PK in an extra table on the source shard. Copy those rows after the entire table(s) has been copied in chunks. There would be a small downtime to do this final copy.Danidania
Federation is a horrible idea. If your plan is to split this to other servers you will have to have downtime. I believe the appropriate action is to use pt-online-schema-change tool to partition the accounts by hash with the number of servers you plan on having and then exporting the partitions to the new system. The actual downtime can be mitigated somewhat by creating a maintenance/outage page for customers that can be dynamically controlled from some type of login table. Then, you only have to create a maintenance for the specific customers that are on the partition you are transporting.Baudelaire
You can have up to 1024 partitions, you can transport multiple partitions to a new server, one at a time, lessening the outage time even more for specific customers. Once the partitions are moved and assuming you have 1024 or less customers per server you can then re-partition on the new servers and create new databases and transport said customer data to those customer specific databases.Baudelaire
@Baudelaire 5.6.7 increased the limit from 1K to 8K partitions. But... Even 100 partitions has performance problems since all partitions are unnecessarily opened for many operations.Danidania
Even if Federate took day(s), would it matter? (But be sure to benchmark it before embarking on it -- it may take months.) Also, an INSERT will wait for its Trigger to finish; the Trigger will wait for the Federated INSERT to finish. That is, writes could be noticeably slower than before.Danidania
@RickJames, the partitions I present are not for performance but for migration to the OP's end result.Baudelaire
@Baudelaire - Creating the partition(s) in the first place could take a lot of time. The OP does not want downtime. I think that initially adding partitioning would involve copying the entire table over (too slow). Hence the need for pt-online-schema-change or something.Danidania
@RickJames - Please reread my comment. The best way to got about this is to partition with ptosc, export, copy and import partitions. If he has space on new servers, one option would be to copy the db in whole (xtrabackup etc) after it was partitioned, setup replication, migrate the customer in the app, cut replication and then drop the unnecessary partitions.Baudelaire
@Baudelaire -- If partitioning is involved, we agree that ptosc is required to minimize downtime. At that point, your method(s) are quite reasonable. (Provide your suggestions as an Answer?) I do not have enough experience with Federated to say whether the OP's Triggers+Federated is viable.Danidania
@RickJames Is there a way to identify insertion order for tables. (while moving data from one shard to another ) I have so many tables and most of them has associations, looking for better way to find the order automatically instead of manual.Apodictic
`@Apodictic - You have a lot of Foreign Keys?Danidania
@RickJames Yes. My 90% tables are associated.Apodictic
@Apodictic - One approach is to disable FK checks during the migration of one user. Then re-enable checks.Danidania
@RickJames you mean setting session_replication_role kind of. If so, it won't make sure we don't get any data inconsistency issues when this destination shard is also being used right? Any other way ? and How you guys handling it?Apodictic
@Apodictic - session_replication_role seems to be a Postgres setting; this Q is tagged [mysql].Danidania
@RickJames Yes it is postgres setting. Are you putting any link or reference?. And another approach i have tried is by removing foreign key constraint for each table and adding back. Problem with it is, it will lock the table right?Apodictic
@Apodictic - There is a setting to turn off (and back on) FOREIGN KEY tests, but not TRIGGERs. Also EVENT processing can be turned off/on. These apply in one server, not related to replication.Danidania
@RickJames Is there any other way not disabling and proceed the migration?Apodictic
@RickJames Can you add me in the discussion of sharding?Apodictic
@Apodictic - THIS seems to be the discussion. Would you like to continue somewhere else? We could open a "chat". We could talk via email (etc). You could start your own Question about something specific on sharding.Danidania
@Apodictic - Embarking on Sharding is a big step. What metrics (or projections) do you have that justify it?Danidania
@RickJames Can we connect in chat/ mail?Apodictic
@Apodictic - mysql at rjweb dot org.Danidania
@RickJames Can you please check you inbox. I have sent a mail to youApodictic
I
0

You could modify your table structure. This table is not 2NF because id is unique (candidate key) while appeared in the primary key (Any other attributes of the table like foo and account_id depend on a subset of the primary key -i.e. id) . The following could do the same job with fewer constraints:

 id bigint(20) not null auto_increment primary key

Now, by creating an index on account_id you could get all benefits of current primary key(account_id,id).

As a second suggestion, you can split the table into two parts: one part containing foo and the other for the rest of columns. This way, you will have a relatively small table (second table) with fixed row length (and hence faster) that stores most of the data (columns), and a variable row length table that is smaller than current table and will be called less frequently.

In summary, before partitioning the table, I suggest you to split it into:

CREATE TABLE `fixed_length` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT primary key,
  `account_id` int(11) NOT NULL,
  `bar` int(11) NOT NULL,
  `baz` char(2) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;

and

create table `variable_length`(
  `id` bigint(20) NOT NULL primary key,
  `foo` varchar(255) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;

and an index for account_id:

 create index ix_account_ix on fixed_length(account_id);

Now, if you want to partition data by account_id, you could keep fixed_length intact and do the partitioning only on the variable_lenth table (by whatever method you choose).

Inconveniency answered 4/7, 2015 at 20:19 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.