MySQL LOAD DATA INFILE with ON DUPLICATE KEY UPDATE
Asked Answered
P

3

55

For loading huge amounts of data into MySQL, LOAD DATA INFILE is by far the fastest option. Unfortunately, while this can be used in a way INSERT IGNORE or REPLACE works, ON DUPLICATE KEY UPDATE is not currently supported.

However, ON DUPLICATE KEY UPDATE has advantages over REPLACE. The latter does a delete and an insert when a duplicate exists. This brings overhead for key management. Also, autoincrement ids will not stay the same on a replace.

How can ON DUPLICATE KEY UPDATE be emulated when using LOAD DATA INFILE?

Pachton answered 7/3, 2013 at 12:25 Comment(1)
Variations on this are discussed in #73694237Im
P
107

These steps can be used to emulate this functionality:

  1. Create a new temporary table.

    CREATE TEMPORARY TABLE temporary_table LIKE target_table;
    
  2. Optionally, drop all indices from the temporary table to speed things up.

    SHOW INDEX FROM temporary_table;
    DROP INDEX `PRIMARY` ON temporary_table;
    DROP INDEX `some_other_index` ON temporary_table;
    
  3. Load the CSV into the temporary table

    LOAD DATA INFILE 'your_file.csv'
    INTO TABLE temporary_table
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
    (field1, field2);
    
  4. Copy the data using ON DUPLICATE KEY UPDATE

    SHOW COLUMNS FROM target_table;
    INSERT INTO target_table
    SELECT * FROM temporary_table
    ON DUPLICATE KEY UPDATE field1 = VALUES(field1), field2 = VALUES(field2);
    
  5. Remove the temporary table

    DROP TEMPORARY TABLE temporary_table;
    

Using SHOW INDEX FROM and SHOW COLUMNS FROM this process can be automated for any given table.

Pachton answered 7/3, 2013 at 12:25 Comment(13)
I suggest rather using INSERT INTO target_table SELECT coloumn_name1, coloumn_name1 FROM temporary_table because * will include the primary key and cause items with the same primary key (in the case of an auto_increment primary key) in the main table to be updated, otherwise this worked for me!Idell
Thanks for the answer, worked for me after a little of SQL juggling. My SQL-fu is very rusty, nice to learn neat techniques like this.Pulsatile
Thanks from here too - works brilliantly - didn't think it was possible to do a ON DUPLICATE KEY with LOAD DATA INFILE, so never tried before.Peradventure
Nice. I don't think so there's a need to DROP the indexes on the temporary table. You are dropping it once you are done with its use :)Simonesimoneau
If you have any problem with UNIQUE KEY in step 4, you can use UPDATE target_table, temporary_table SET target_table.field1 = temporary_table.field1 WHERE target_table.field2 = temporary_table.field2Volumed
@Pachton is this advisable if the TEMPORARY TABLE contains a few million entries?Capo
@Simonesimoneau - Dropping indexes will improve performance for loading data from file!Miasma
@Pachton Quite interested in knowing the answer to Sammy Lee 's questionHortensiahorter
Steps 1) and 2) can be done in a single statement: CREATE TEMPORARY TABLE temporary_table SELECT * FROM target_table LIMIT 0;Czechoslovakia
Note: If you use statement based replication, you should use a regular table instead of TEMPORARY. Otherwise you might get into issues.Alanalana
Only one load data query with insert and update. Is it possible?Mucronate
Seeing that OP is dealing with duplicate keys, I would recommend removing the primary key constraint after creating the temporary table, as the new data may contain duplicates as well: ALTER TABLE temporary_table DROP PRIMARY KEY.Leidaleiden
Be aware of what may happens to the AUTO_INCREMENT column if the table has such.Im
Q
5

We can replace first (two steps) with below single query in the answer shared by Jan.

For steps 1 and 2 we can create new table with same reference structure and without any indexes.

CREATE TEMPORARY TABLE temporary_table SELECT * FROM target_table WHERE 1=0;

Instead of.

  1. Create a new temporary table.

    CREATE TEMPORARY TABLE temporary_table LIKE target_table;
    
  2. Optionally, drop all indices from the temporary table to speed things up.

    SHOW INDEX FROM temporary_table;
    DROP INDEX `PRIMARY` ON temporary_table;
    DROP INDEX `some_other_index` ON temporary_table;
    
Quietism answered 6/9, 2017 at 10:33 Comment(0)
A
0

Non-LOCAL Versus LOCAL Operation

The LOCAL modifier affects these aspects of LOAD DATA, compared to non-LOCAL operation:

  • It changes the expected location of the input file; see Input File Location.
  • It changes the statement security requirements; see Security Requirements.
  • It has the same effect as the IGNORE modifier on the interpretation of input file contents and error handling; see Duplicate-Key and Error Handling, and Column Value Assignment.

LOCAL works only if the server and your client both have been configured to permit it. For example, if mysqld was started with the local_infile system variable disabled, LOCAL produces an error. See Section 6.1.6, “Security Considerations for LOAD DATA LOCAL”.

Autacoid answered 16/3, 2022 at 10:49 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.