Mysql take dump of some portion like 10-20 % of whole database

Asked 22/5, 2018 at 10:51 Answered 30/5, 2018 at 23:32

I know that to take database dump I can ignore some tables. but I want to take the dump of all table with some percentage of data like 20% 40% because the whole dump is too big. here is my normal dump query.

mysqldump -h dburl -u user -p password --databases dbname > dbname.sql

I am not looking for specific OS and using Linux Ubuntu.

Gimcrackery answered 22/5, 2018 at 10:51 Comment(5)

You can use combination of mysqldump -h dburl -u user -p password --no-data --databases dbname > dbname.schema.sql to get DB schema, then a sequence of SELECT ... INTO OUTFILE statements to dump tables data with filtering. And LOAD DATA INFILE statements to load data back. See dev.mysql.com/doc/refman/8.0/en/select-into.html for details. – Margaretemargaretha 24/5, 2018 at 14:53

The problem you are likely to have is making your subset respect constraints, without which it will surely be fairly unusable in practice. I would suggest googling for open source tools aimed specifically for this purpose - these do clearly exist, – Jacquejacquelin 24/5, 2018 at 20:21

You should probably tag the OS you are working on which helps scripting people. – Selaginella 24/5, 2018 at 23:6

@Jacquejacquelin Yes That is bigger problem with current options, It would be awesome if you give some good name or links for tools Thanks. – Gimcrackery 27/5, 2018 at 18:39

I haven't tried it myself but this looks like the sort of thing: github.com/18F/rdbms-subsetter – Jacquejacquelin 28/5, 2018 at 0:39

The 80-20 rule says that the smallest 80% of the tables will probably consume only 20% of the space. So have one mysqldump for them.

Then have more mysqldump(s) for each remaining table smaller than 20% of the space.

Finally, any big tables need the --where option mentioned by Nambu14. Or you could try the kludge of saying --where="true LIMIT 20000,10000" to sneak an OFFSET and LIMIT in. (See one of the comments on https://dev.mysql.com/doc/refman/8.0/en/mysqldump.html ) But do not allow writes to the table while doing that -- it could lead to extra/missing records.

Or you could adapt chunking techniques as discussed here . This avoids the extra/missing problem and avoids the LIMIT kludge. With luck, you can hard code the range values needed for ranges like this --where="my_pk >= 'def' AND my_pk < 'mno'"

Don't forget to deal with Triggers, Stored routine, Views, etc.

Chuppah answered 24/5, 2018 at 21:37 Comment(0)

There's a similar question open. With the --where option you can limit the amount of records included in the mysqldump (official documentation here), but this option applies for every table in the database.

Another way is to give the command a sql script to run and prepare the data in that script, this will work as a pseudo ETL pipeline.

Unaffected answered 24/5, 2018 at 14:58 Comment(2)

Thanks, That kinda solves the problem and making a script doesn't sounds good will wait for some more time to see something better. – Gimcrackery 24/5, 2018 at 15:6

Glad to be helpful! :) – Unaffected 24/5, 2018 at 15:9

It sounds like you want to avoid making a script, one quick solution is to use the --where option for mysqldump.

mysqldump --opt --where="1 limit 1000" myschema

This will limit dumps to 1000 rows - obviously adjust to your size restrictions.

You can follow this up with an offset dump to get the next 1000 - a small adjustment is needed so the table is not recreated.

mysqldump --opt --where="1 limit 1000 offset 1000" --no-create-info myschema

You can mix this up further, say you want only 40% of all data, from randomly selected rows:

mysqldump --opt --where="1 having rand() < 0.40" myschema

Longlived answered 30/5, 2018 at 23:32 Comment(0)

Recommended topics

Hot tags