Batch insert with table that has many columns using Anorm
Asked Answered
G

3

6

I'm trying to do a batch insert into a MySQL database table using Anorm (in the play framework 2.3.1). The application I'm building has a standard web front end in addition to requiring batch data inserts and I would like to try and keep the logic on the same software stack.

The inserts only go into the same few tables.

The number of rows to be insert at once will reach hundreds and may get into thousands, I expect I may need to limit the number of inserted rows at some point due to anorm / mysql / other limits.

The MySQL driver I'm using is mysql-connector-java - 5.1.31

Below is a cut down use case.

Using table:

CREATE TABLE table1
(
  col1    INTEGER   NOT NULL,
  col2    BIGINT,
  col3    VARCHAR(255)
); 

And scala code:

import play.api.Play.current
import play.api.db.DB
import anorm._ 

object TestInserts {

  DB.withConnection("spo") { implicit conn => 

    val theInserts = Seq(
       Seq[NamedParameter]('val1 -> 1, 'val2 -> Some(1L), 'val3 -> Some("One"))
      ,Seq[NamedParameter]('val1 -> 2, 'val2 -> Some(2L), 'val3 -> Some("Two"))
      ,Seq[NamedParameter]('val1 -> 3, 'val2 -> Some(3L), 'val3 -> Some("Three"))
    )

    val insertBatchSQL = BatchSql( SQL("insert into table1 (col1, col2, col3) values ({val1}, {val2}, {val3})"), theInserts)  

    insertBatchSQL.execute

  } 

}

I am getting the following error

java.sql.SQLException: Parameter index out of range (1 > number of parameters, which is 0).
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1094)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:997)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:983)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:928)
at com.mysql.jdbc.PreparedStatement.checkBounds(PreparedStatement.java:3688)
at com.mysql.jdbc.PreparedStatement.setInternal(PreparedStatement.java:3670)
at com.mysql.jdbc.PreparedStatement.setInternal(PreparedStatement.java:3715)
at com.mysql.jdbc.PreparedStatement.setInt(PreparedStatement.java:3659)
at com.jolbox.bonecp.PreparedStatementHandle.setInt(PreparedStatementHandle.java:828)
at anorm.ToStatement$intToStatement$.set(ToStatement.scala:164)
at anorm.ToStatement$intToStatement$.set(ToStatement.scala:163)
...

I looked at the test classes in the play framework that test the batch insert https://github.com/playframework/playframework/blob/master/framework/src/anorm/src/test/scala/anorm/BatchSqlSpec.scala and as far as I can tell it should work the same.

Any pointers on how to fix this or if I should be tackling this in a different way would be great.

Glyptodont answered 4/7, 2014 at 11:25 Comment(0)
M
13

I'll go with option B. I'm not very familiar with BatchSql since last I checked it just executes a boat load of queries in sequence, which is awfully slow. I'd recommend aggregating everything into a single query. It's a little more tedious, but much faster to execute a single query with one thousand inserts than one thousand single inserts.

For convenience, lets say you have Seq of

case class Test(val1: Int, val2: Option[Long], val3: Option[String])

Then you could build your query like this:

val values: Seq[Test] = Seq(....)

/* Index your sequence for later, to map to inserts and parameters alike */
val indexedValues = values.zipWithIndex

/* Create the portion of the insert statement with placeholders, each with a unique index */
val rows = indexValues.map{ case (value, i) =>
    s"({val1_${i}}, {val2_${i}}, {val3_${i}})"
}.mkString(",")

/* Create the NamedParameters for each `value` in the sequence, each with their unique index in the token, and flatten them together */
val parameters = indexedValues.flatMap{ case(value, i) =>
    Seq(
        NamedParameter(s"val1_${i}" -> value.val1),
        NamedParameter(s"val2_${i}" -> value.val2),
        NamedParameter(s"val3_${i}" -> value.val3)
    ) 
}

/* Execute the insert statement, applying the aggregated parameters */
SQL("INSERT INTO table1 (col1, col2, col3) VALUES " + rows)
    .on(parameters: _ *)
    .executeInsert()

Notes:

You will have to check that values is non-empty before proceeding, as it would generate an invalid SQL statement if it was.

Depending on how many rows and columns you're inserting, eventually the token parsers that created the prepared statement will slow down from the sheer amount of tokens to parse (and the string size). I've noticed this after a few hundred rows with several columns. This can be mitigated somewhat. Thanks to Scala being a strongly typed language, Int and Long pose no threat for SQL injection. You could prepare your SQL statements using string interpolation/concatenation for just those columns and bind the unsafe columns with NamedParameter normally. That would cut down on the number of tokens that need to be parsed.

Moua answered 4/7, 2014 at 13:33 Comment(6)
If anorm BatchSql only executes the statements one at a time it defeats the reason for me choosing it in the first place which was to get improved performance due to things like less network trips. So this creative solution may be the way to go.....Glyptodont
I last tried it on 2.2. I think it's just supposed to be a convenient way to re-run the same query, without re-writing anything.Moua
Just implemented this approach and all is working well on a few hundred rows.Glyptodont
Thank you. This worked well for me. I had to tweak the syntax a it to get it working in Play! 2.4.x... Changed .executeInsert() to .execute() and NamedParameter(s"val1_${i}" -> value.val1) to NamedParameter(s"val1_${i}", value.val1)Closer
Thanks for this. Made my import go much faster. Just like @Closer I had make 2 edits for this to work in 2.4.x. I also changed -> to , but I changed .executeInsert() to .executeInsert(SqlParser.scalar[Long] *) because it returns multiple ids. But beware if you use this with ON DUPLICATE KEY UPDATE you get incorrect ids returned for rows that are updated instead of inserted. Not sure if this is a MySql issue or Anorm.Persimmon
See github.com/playframework/anorm/issues/574Hungnam
U
2

Try this:

  def save(types: List[RoomType]): Unit = {
     DB.withTransaction { implicit c =>
        val update = SQL("insert into TABLE(col1, col2, col3) values ({col1}, {col2}, {col3})")
        val batch = (update.asBatch /: types)(
    (sql, _type) => sql.addBatchParams(_type.id, _type.name, _type.lang))
    batch.execute
  }
}
Unwitting answered 4/3, 2015 at 17:2 Comment(0)
P
1

Some issues on BatchSql have been fixed 12d ago (backported in 2.3.1): https://github.com/playframework/playframework/pull/3087. It should work with it.

Pattypatulous answered 7/7, 2014 at 18:53 Comment(2)
Thank you for the link, the above code is running against play 2.3.1 and getting the error. It looks like the BatchSql is in flux at the moment, and version 2.3.2 may solve it.Glyptodont
If you already have 2.3.1, you may need to clean local repo, to ensure it get updated artifacts.Pattypatulous

© 2022 - 2024 — McMap. All rights reserved.