Uploading CSV into MySQL table with simultaneous JOIN
Asked Answered
D

3

6

What I'm trying to do is upload a CSV into a table, while appending information from a third table to the target table using JOIN.

The CSV import.csv (with 1M rows) looks like this:

firstname | lastname

The target table "names" looks like this:

firstname | lastname | gender

And the table "gender" (with 700k rows) looks like this:

firstname | gender

So, my ideal query would look something like this:

LOAD DATA LOCAL INFILE "import.csv"
INTO TABLE names n
LEFT JOIN gender g ON(g.firstname=n.firstname)

Something along those lines, to combine the import with the join so the end result in names has the data from gender and the CSV.

However, I know that LOAD DATA LOCAL INFILE can't be combined with JOIN, and attempts to use INSERT plus JOIN for each line are too CPU intensive.

Any ideas?

Duodenal answered 31/8, 2013 at 19:31 Comment(0)
K
12

You can use SET clause of LOAD DATA INFILE to achieve your goal

LOAD DATA LOCAL INFILE '/path/to/your/file.csv'
INTO TABLE names
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' 
LINES TERMINATED BY '\n' -- or '\r\n' if file has been prepared in Windows
IGNORE 1 LINES -- use this if your first line contains column headers 
(@first, @last)
SET firstname = @first,
    lastname = @last,
    gender = 
    (
      SELECT gender 
        FROM gender
       WHERE firstname = @first
       LIMIT 1
    )

Make sure that:

  • you have an index on firstname column in gender table
  • you don't have any indices on names table before you load data. Add them (indices) after you complete the load.
Karlotta answered 1/9, 2013 at 1:58 Comment(0)
C
2

MySql LOAD DATA INFILE syntax doesn't define JOIN.

  1. CREATA TABLE temporary_table...
  2. LOAD DATA INFILE "import.csv" INTO TABLE temporary_table FIELDS TERMINATED BY '|' ENCLOSED BY '"' LINES TERMINATED BY '\n';
  3. INSERT INTO names(t.firstname, t.lastname, g.gender) SELECT FROM temporary_table t LEFT JOIN gender g ON(g.firstname=n.firstname);
Carlow answered 31/8, 2013 at 20:24 Comment(0)
P
1

In my experience, the best way to load data into a database is to place it in a staging table first where all the columns are characters. Then, transform the data in the database to your final output.

Applying this to your code:

LOAD DATA LOCAL INFILE "import.csv"
INTO TABLE names_staging;

CREATE TABLE names as
    select n.firstname, n.lastname, g.gender
    from names_staging n LEFT JOIN
         gender g
         ON g.firstname = n.firstname;

This makes it possible to identify and fix problems from the data load. You can also easily add additional columns such as primary keys and insert dates into the final table.

Poikilothermic answered 31/8, 2013 at 19:38 Comment(2)
This is a good solution for creating new tables, but the added challenge is I want one master "names" table where the CSV data is loaded into. So it wouldn't be "Create Table" as much as "Select Into" (is that even possible?)Duodenal
@BrianMayer . . . It would be insert instead of create table as. That is all the more reason you should use a staging table.Poikilothermic

© 2022 - 2024 — McMap. All rights reserved.