I have some monthly weather data that I want to insert into an Oracle database table but I want to insert the corresponding records in a batch in order to be more efficient. Can anyone advise as to how I'd go about doing this in Python?
For example let's say my table has four fields: a station ID, a date, and two value fields. The records are uniquely identified by the station ID and date fields (composite key). The values I'll have to insert for each station will be kept in a list with X number of full years worth of data, so for example if there are two years of values then the value lists will contain 24 values.
I assume that below is the way I'd do this if I wanted to insert the records one at a time:
connection_string = "scott/tiger@testdb"
connection = cx_Oracle.Connection(connection_string)
cursor = cx_Oracle.Cursor(connection)
station_id = 'STATION_1'
start_year = 2000
temps = [ 1, 3, 5, 7, 9, 1, 3, 5, 7, 9, 1, 3 ]
precips = [ 2, 4, 6, 8, 2, 4, 6, 8, 2, 4, 6, 8 ]
number_of_years = len(temps) / 12
for i in range(number_of_years):
for j in range(12):
# make a date for the first day of the month
date_value = datetime.date(start_year + i, j + 1, 1)
index = (i * 12) + j
sql_insert = 'insert into my_table (id, date_column, temp, precip) values (%s, %s, %s, %s)', (station_id, date_value, temps[index], precips[index]))
cursor.execute(sql_insert)
connection.commit()
Is there a way to do what I'm doing above but in a way that performs a batch insert in order to increase efficiency? BTW my experience is with Java/JDBC/Hibernate so if someone can give an explanation/example which compares to the Java approach then it'd be especially helpful.
EDIT: Perhaps I need to use cursor.executemany() as described here?
Thanks in advance for any suggestions, comments, etc.
.executemany()
call instead? – Flagler.executemany()
is not a batch update; you will do as many DML statements as you have cursor you execute... It makes you Python code cleaner but it is not more forgiving to the database. What does your original data look like; is it entered by a user and so stored in variables in your code or is it in a text file etc.? – Tila.executemany()
only makes a single call toOCIStmtExecute()
– OstensoryOCIstmtExecute
does not do batch processing: "For non-SELECT statements, the number of times this statement is executed is equal to iters - rowoff." In the c source,Cursor_ExecuteMany
passes the number of rows toCursor_InternalExecute
, which then passes the same toOCIStmtExecute
. – Tila.executemany()
just eliminates the round-trip times that repeated calls to.execute()
would require, and the blog post is worded awkwardly. So, the only way to do a batch operation would be calling.execute()
with a statement likeBULK INSERT...
orINSERT ALL...
– Ostensory