R bulk upload data to MYSQL database
Asked Answered
G

3

5

there is the package: RMySQL

How can I bulk upload lots of data to mysql from R? I have a csv with around 1 million lines and 80 columns.

Would something like this work?

dbWriteTable(con, "test2", "~/data/test2.csv") ## table from a file

I fear this inserts line by line...

Graniela answered 15/7, 2013 at 2:41 Comment(3)
I'd also consider the command-line tool for mysql as an alternative.Poop
There are also MySQL GUIs such as Sequel Pro (for Mac) and Heidi SQL (for windows) that should have a csv import option...Carmelo
If your data is outside R, and you're sending it to a destination that's also outside R, you shouldn't get R involved. That will just slow things down and introduce a lot of intermediate steps that could go wrong. One thing you can do in R is to load just the first few lines to inspect the headers and column types. That can be helpful to ensure you have the table structure right.Variorum
F
4

If you are inserting many rows from the same client at the same time, You can use INSERT statements with multiple VALUES

INSERT INTO test2(col1, col2, col3, col4) 
VALUES 
('val1', 'val2', val3, val4),
('val1', 'val2', val3, val4),
('val1', 'val2', val3, val4)

Here an example on how to create your query. I am using data.table here:

dat <- matrix(seq(4*3), 3, 4)
library(data.table)
DT <- data.table(t(dat))
query <- paste('INSERT INTO test2(col1, col2, col3, col4)\nVALUES\n',
                gsub('c','',(DT[,paste(.SD,collapse='\n')])))

   cat(query)
INSERT INTO test2(col1, col2, col3, col4)
VALUES
 (1, 4, 7, 10)
(2, 5, 8, 11)
(3, 6, 9, 12)

then you can execute it using dbGetQuery:

  dbGetQuery(con, query)
Figureground answered 15/7, 2013 at 3:40 Comment(2)
thats an interesting approach as well! I will try it and compare speedsGraniela
This answer doesn't work if your data is in a data.frame b/c t(myData) doesn't make sense for data.frames in general.Downward
D
6

Since you have lots of data consider using LOAD DATA. It's the fastest method of importing data from a file according to mysql docs.

LOAD DATA INFILE
The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed.

Speed of INSERT Statements
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using INSERT statements. See Section 13.2.6, “LOAD DATA INFILE Syntax”.
...
INSERT is still much slower for loading data than LOAD DATA INFILE, even when using the strategies just outlined.

LOAD DATA INFILE '/path/to/your/file.csv' 
INTO TABLE contacts 
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' -- or '\r\n'
IGNORE 1 LINES; -- use IGNORE if you have a header line in your file
Dissimilitude answered 15/7, 2013 at 3:45 Comment(0)
F
4

If you are inserting many rows from the same client at the same time, You can use INSERT statements with multiple VALUES

INSERT INTO test2(col1, col2, col3, col4) 
VALUES 
('val1', 'val2', val3, val4),
('val1', 'val2', val3, val4),
('val1', 'val2', val3, val4)

Here an example on how to create your query. I am using data.table here:

dat <- matrix(seq(4*3), 3, 4)
library(data.table)
DT <- data.table(t(dat))
query <- paste('INSERT INTO test2(col1, col2, col3, col4)\nVALUES\n',
                gsub('c','',(DT[,paste(.SD,collapse='\n')])))

   cat(query)
INSERT INTO test2(col1, col2, col3, col4)
VALUES
 (1, 4, 7, 10)
(2, 5, 8, 11)
(3, 6, 9, 12)

then you can execute it using dbGetQuery:

  dbGetQuery(con, query)
Figureground answered 15/7, 2013 at 3:40 Comment(2)
thats an interesting approach as well! I will try it and compare speedsGraniela
This answer doesn't work if your data is in a data.frame b/c t(myData) doesn't make sense for data.frames in general.Downward
D
3

Just to flush out @peterm's answer, here's a function that loads a data.frame into MySQL via the LOAD DATA INFILE:

saveData <- function(data, # a data frame
                     tableName, # table name, possibly qualified (e.g. "my_db.customers")
                     ...) # arguments to DBI::dbConnect
                    {

    query  <-  sprintf("LOAD DATA INFILE '%s' 
INTO TABLE %s 
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' -- or '\r\n'
IGNORE 1 LINES;" , TEMPFILE,tableName)

    # WRITE THE DATA TO A LOCAL FILE
    TEMPFILE  <-  tempfile(fileext='.csv')
    write.csv(data,TEMPFILE)
    on.exit(file.remove(TEMPFILE))

    # CONNECT TO THE DATABASE
    db <- dbConnect(MySQL(), ... )

    # SUBMIT THE UPDATE QUERY AND DISCONNECT
    dbGetQuery(db, query)
    dbDisconnect(db)
}
Downward answered 17/11, 2015 at 23:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.