How can I do a batch insert into an Oracle database using Python?
Asked Answered
B

5

16

I have some monthly weather data that I want to insert into an Oracle database table but I want to insert the corresponding records in a batch in order to be more efficient. Can anyone advise as to how I'd go about doing this in Python?

For example let's say my table has four fields: a station ID, a date, and two value fields. The records are uniquely identified by the station ID and date fields (composite key). The values I'll have to insert for each station will be kept in a list with X number of full years worth of data, so for example if there are two years of values then the value lists will contain 24 values.

I assume that below is the way I'd do this if I wanted to insert the records one at a time:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12
for i in range(number_of_years):
    for j in range(12):
        # make a date for the first day of the month
        date_value = datetime.date(start_year + i, j + 1, 1)
        index = (i * 12) + j
        sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, %s, %s, %s)', (station_id, date_value, temps[index], precips[index]))
        cursor.execute(sql_insert)
connection.commit()

Is there a way to do what I'm doing above but in a way that performs a batch insert in order to increase efficiency? BTW my experience is with Java/JDBC/Hibernate so if someone can give an explanation/example which compares to the Java approach then it'd be especially helpful.

EDIT: Perhaps I need to use cursor.executemany() as described here?

Thanks in advance for any suggestions, comments, etc.

Busboy answered 15/2, 2013 at 21:58 Comment(10)
Use the .executemany() call instead?Flagler
Yes, Martijn, I just found that (and updated/edited the question to include it) and it does look to be the solution to this problem. Thanks for your input!Busboy
Use one query with INSERT ALL to insert multiple rows at once.Tahitian
.executemany() is not a batch update; you will do as many DML statements as you have cursor you execute... It makes you Python code cleaner but it is not more forgiving to the database. What does your original data look like; is it entered by a user and so stored in variables in your code or is it in a text file etc.?Tila
Thanks, Ben. The data is computed and stored in list variables. So in the simple example above the temps and precips lists would actually be computed in my Python code before we reach this section instead of assigned as in the example code.Busboy
@Ben, according to whom? The C code for .executemany() only makes a single call to OCIStmtExecute()Ostensory
Yes @Ostensory but OCIstmtExecute does not do batch processing: "For non-SELECT statements, the number of times this statement is executed is equal to iters - rowoff." In the c source, Cursor_ExecuteMany passes the number of rows to Cursor_InternalExecute, which then passes the same to OCIStmtExecute.Tila
This page contradicts that statement: "Large insert operations don't require many separate inserts because Python fully supports inserting many rows at once with the cx_Oracle.Cursor.executemany method. Limiting the number of execute operations improves program performance a lot and should be the first thing to think about when writing applications heavy on INSERTs." So which is correct? I can't find any cx_Oracle benchmarks online anywhere.Ostensory
I'm going to trust the source code and the documentation over a blog post @alldayremix. I am, of course, assuming thqt I've interpreted it correctly. Please note that I'm not claiming that executemany() is not quicker than multiple execute()'s just that it's not a batch operation against the database. Benchmarks, therefore, can't help you.Tila
Sounds reasonable to me. It seems .executemany() just eliminates the round-trip times that repeated calls to .execute() would require, and the blog post is worded awkwardly. So, the only way to do a batch operation would be calling .execute() with a statement like BULK INSERT... or INSERT ALL...Ostensory
B
20

Here's what I've come up with which appears to work well (but please comment if there's a way to improve this):

# build rows for each date and add to a list of rows we'll use to insert as a batch 
rows = [] 
numberOfYears = endYear - startYear + 1
for i in range(numberOfYears):
    for j in range(12):
        # make a date for the first day of the month
        dateValue = datetime.date(startYear + i, j + 1, 1)
        index = (i * 12) + j
        row = (stationId, dateValue, temps[index], precips[index])
        rows.append(row)

# insert all of the rows as a batch and commit
ip = '192.1.2.3' 
port = 1521
SID = 'my_sid'
dsn = cx_Oracle.makedsn(ip, port, SID)
connection = cx_Oracle.connect('username', 'password', dsn)
cursor = cx_Oracle.Cursor(connection)
cursor.prepare('insert into ' + database_table_name + ' (id, record_date, temp, precip) values (:1, :2, :3, :4)')
cursor.executemany(None, rows)
connection.commit()
cursor.close()
connection.close()
Busboy answered 16/2, 2013 at 1:7 Comment(1)
For anyone finding this via search, there is a new Oracle example (that also shows how to deal with 'noisy' data that can't be inserted) at github.com/oracle/python-oracledb/blob/main/samples/load_csv.pyMeshed
O
10

Use Cursor.prepare() and Cursor.executemany().

From the cx_Oracle documentation:

Cursor.prepare(statement[, tag])

This can be used before a call to execute() to define the statement that will be executed. When this is done, the prepare phase will not be performed when the call to execute() is made with None or the same string object as the statement. [...]

Cursor.executemany(statement, parameters)

Prepare a statement for execution against a database and then execute it against all parameter mappings or sequences found in the sequence parameters. The statement is managed in the same way as the execute() method manages it.

Thus, using the above two functions, your code becomes:

connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000

temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12

# list comprehension of dates for the first day of the month
date_values = [datetime.date(start_year + i, j + 1, 1) for i in range(number_of_years) for j in range(12)]

# second argument to executemany() should be of the form:
# [{'1': value_a1, '2': value_a2}, {'1': value_b1, '2': value_b2}]
dict_sequence = [{'1': date_values[i], '2': temps[i], '3': precips[i]} for i in range(1, len(temps))]

sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, :1, :2, :3)', station_id)
cursor.prepare(sql_insert)
cursor.executemany(None, dict_sequence)
connection.commit()

Also see Oracle's Mastering Oracle+Python series of articles.

Ostensory answered 15/2, 2013 at 22:31 Comment(3)
Thanks for this. I've tried the above but it doesn't work due to the fact that executemany() only takes two arguments. And it seems that you still have to create a list of rows to use as the sequence parameters argument, as described in the section titled "Many at Once" here: oracle.com/technetwork/articles/dsl/… and the sections describing the use of executemany() here: oracle.com/technetwork/articles/dsl/…Busboy
Also it's not clear how using a prepare() gains you anything since according to the cx_Oracle.Cursor.execute() documentation: "A reference to the statement will be retained by the cursor. If None or the same string object is passed in again, the cursor will execute that statement again without performing a prepare or rebinding and redefining." But I may be misunderstanding something here...Busboy
Using prepare() doesn't gain you anything, it just makes your code clearer; the underlying C code for either method is the same. As for your first comment, I updated my answer to address that: executemany() takes a sequence of dictionaries as the second argument.Ostensory
S
6

As one of the comments says, consider using INSERT ALL. Supposedly it'll be significantly faster than using executemany().

For example:

INSERT ALL
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
  INTO mytable (column1, column2, column_n) VALUES (expr1, expr2, expr_n)
SELECT * FROM dual;

http://www.techonthenet.com/oracle/questions/insert_rows.php

Scolopendrid answered 17/3, 2016 at 18:44 Comment(1)
This is a correct answer. Multiple inserts are handled with one roundtrip to the database.Holocaust
I
6

fyi my test result:

I insert into 5000 rows. 3 columns per row.

  1. run insert 5000 times, it costs 1.24 minutes.
  2. run with executemany, it costs 0.125 seconds.
  3. run with a insert all code: it costs 4.08 minutes.

python code, which setup the sql like insert all into t(a,b,c) select :1, :2, :3 from dual union all select :4, :5: :6 from daul...

The python code to setup this long sql, it cost 0.145329 seconds.

I test my code on a very old sun machine. cpu: 1415 MH.

in the third case, I checked the database side, the wait event is "SQL*Net more data from client". which means the server is waiting for more data from client.

The result of the third method is unbelievable for me without the test.

so the short suggestion from me is just to use executemany.

Inconsiderable answered 26/5, 2016 at 5:8 Comment(0)
R
3

I would create a large SQL insert statement using union:

insert into mytable(col1, col2, col3)
select a, b, c from dual union
select d, e, f from dual union
select g, h, i from dual

You can build the string in python and give it to oracle as one statement to execute.

Rightness answered 8/2, 2016 at 17:55 Comment(1)
This is a correct answer. Multiple inserts are handled with one roundtrip to the database. One issue - You probably want to do a UNION ALL instead of a regular UNION.Holocaust

© 2022 - 2024 — McMap. All rights reserved.