On the fly anonymisation of a MySQL dump
Asked Answered
S

7

7

I am using mysqldump to create DB dumps of the live application to be used by developers.

This data contains customer data. I want to anonymize this data, i.e. remove customer names / credit card data.

An option would be:

  • create copy of database (create dump and import dump)
  • fire SQL queries that anonymize the data
  • dump the new database

But this has to much overhead. A better solution would be, to do the anonymization during dump creation.

I guess I would end up parsing all the mysqlsqldump output? Are there any smarter solutions?

Shagreen answered 7/1, 2013 at 15:55 Comment(0)
H
4

You can try Myanon: https://myanon.io

Anonymization is done on the fly during dump:

mysqldump | myanon -f db.conf | gzip > anon.sql.gz
Hildegaard answered 23/11, 2021 at 20:6 Comment(0)
A
3

Why are you selecting from your tables if you want to randomize the data?

Do a mysqldump of the tables that are safe to dump (configuration tables, etc) with data, and a mysqldump of your sensitive tables with structure only.

Then, in your application, you can construct the INSERT statements for the sensitive tables based on your randomly created data.

Actuary answered 7/1, 2013 at 16:4 Comment(3)
But then I still have to re-create mysqldump logic (for creating those inserts). I would like to avoid that.Shagreen
You're not recreating mysqldump logic, you're constructing SQL queries. The job of mysqldump is to, well, dump the data from your tables and reverse-engineer it back to queries. But you don't want the data from your tables...you want your own data.Actuary
Good point. But even in the affected tables, 8 out of 10 fields would stay as they are. So I want mysqldump logic but also own data.Shagreen
D
3

I had to develop something similar few days ago. I couldn't do INTO OUTFILE because the db is AWS RDS. I end up with that approach:

Dump data in tabular text form from some table:

mysql -B -e 'SELECT `address`.`id`, "address1" , "address2", "address3", "town", "00000000000" as `contact_number`, "[email protected]" as `email` FROM `address`' some_db > addresses.txt

And then to import it:

mysql --local-infile=1 -e "LOAD DATA LOCAL INFILE 'addresses.txt' INTO TABLE \`address\` FIELDS TERMINATED BY '\t' ENCLOSED BY '\"' IGNORE 1 LINES" some_db

only mysql command is required to do this.

As the export is pretty quick (couple of seconds for ~30.000 rows), the import process is a bit slower, but still fine. I had to join few tables on the way and there was some foreign keys so it will surely be faster if you don't need that. Also if you disable foreign key checks while importing it will also speed up things.

Dayak answered 6/5, 2019 at 11:5 Comment(0)
N
1

You could do a select of each table (and not a select *) and specify the columns you want to have and omit or blank those you don't want to have, and then use the export option of phpmyadmin for each query.

Nammu answered 7/1, 2013 at 15:59 Comment(0)
S
1

You can also use the SELECT ... INTO OUTFILE syntax from a SELECT query to make a dump with a column filter.

Schroder answered 7/1, 2013 at 16:9 Comment(2)
can it export MySQL statements - I think it can do only CSV and similar.Shagreen
+1 SELECT uid, name, '[email protected]' AS email INTO OUTFILE '/tmp/anonymised.sql' FROM users and LOAD DATA INFILE '/tmp/anonymised.sql' INTO TABLE users seem to work nicely together. I'd start there.Striped
S
0

phpMyAdmin provides an export option to the SQL format based on SQL queries. It might be an option to extract this code from PHPmyadmin (which is probably well tested) and use it in this application.

Refer to the phpMyAdmin export plugin - exportData method for the code.

Shagreen answered 7/1, 2013 at 16:46 Comment(0)
P
0

I found to similar questions but it looks like there is no easy solution for what you want. You will have to write a custom export yourself.

Pryer answered 7/1, 2013 at 16:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.