Importing data from an excel file using python into SQL Server
Asked Answered
B

2

10

I have found some other questions that have a similar error to what I am getting, but have not been able to figure out how to resolve this based on the answers. I am trying to import an excel file into SQL Server with the help of python. This is the code I wrote:

import pandas as pd
import numpy as np
import pandas.io.sql
import pyodbc
import xlrd

server = "won't disclose private info"
db = 'private info'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + Server + ';DATABASE=' + 
db + ';Trusted_Connection=yes')

cursor = conn.cursor()
book = xlrd.open_workbook("Daily Flash.xlsx")
sheet = book.sheet_by_name("Sheet1")

query1 = """CREATE TABLE [LEAF].[MK] ([LEAF][Lease_Number] varchar(255), 
[LEAF][Start_Date] varchar(255), [LEAF][Report_Status] varchar(255), [LEAF] 
[Status_Date] varchar(255), [LEAF][Current_Status] varchar(255), [LEAF] 
[Sales_Rep] varchar(255), [LEAF][Customer_Name] varchar(255),[LEAF] 
[Total_Finance] varchar(255),
[LEAF][Rate_Class] varchar(255) ,[LEAF][Supplier_Name] varchar(255) ,[LEAF] 
[DecisionStatus] varchar(255))"""


query = """INSERT INTO [LEAF].[MK] (Lease_Number, Start_Date, Report_Status, 
Status_Date, Current_Status, Sales_Rep, Customer_Name,Total_Finance,
Rate_Class,Supplier_Name,DecisionStatus) VALUES (%s, %s, %s, %s, %s, %s, %s, 
%s, %s, %s, %s)"""

for r in range(1, sheet.nrows):
    Lease_Number  = sheet.cell(r,0).value
    Start_Date    = sheet.cell(r,1).value
    Report_Status = sheet.cell(r,2).value
    Status_Date   = sheet.cell(r,3).value
    Current_Status= sheet.cell(r,4).value
    Sales_Rep     = sheet.cell(r,5).value
    Customer_Name = sheet.cell(r,6).value
    Total_Financed= sheet.cell(r,7).value
    Rate_Class    = sheet.cell(r,8).value
    Supplier_Name = sheet.cell(r,9).value
    DecisionStatus= sheet.cell(r,10).value


    values = (Lease_Number, Start_Date, Report_Status, Status_Date, 
    Current_Status, Sales_Rep, Customer_Name, Total_Financed, Rate_Class, 
    Supplier_Name, DecisionStatus)

    cursor.execute(query1)

    cursor.execute(query, values)


database.commit()


database.close()


database.commit()

The error message I get is:

ProgrammingError                          Traceback (most recent call last)
<ipython-input-24-c525ebf0af73> in <module>()
 16 
 17     # Execute sql Query
 ---> 18     cursor.execute(query, values)
 19 
 20 # Commit the transaction

 ProgrammingError: ('The SQL contains 0 parameter markers, but 11 parameters 
 were supplied', 'HY000')

Can someone please explain the problem to me and how I can fix it? Thank you!

Update:

I have gotten that error message to go away based on the comments below. I modified my query also because the table into which I am trying to insert values into was not previously created, so I updated my code in an attempt to create it.

However, now I am getting the error message:

ProgrammingError: ('42000', '[42000] [Microsoft][ODBC SQL Server Driver][SQL 
Server]The specified schema name "dbo" either does not exist or you do not 
have permission to use it. (2760) (SQLExecDirectW)')

I tried changing that slightly by writing CREATE [HELLO][MK] instead of just CREATE MK but that tells me that MK is already in the database... What steps should I take next?

Banded answered 10/7, 2018 at 15:23 Comment(20)
Instead of using %s as the placeholders, use ? instead.Ore
Thank you that got rid of that error message and created a new one: ProgrammingError: ('42S02', "[42S02] [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid object name 'orders'. (208) (SQLExecDirectW)"). What can I do about that?Banded
I just realized 'Orders' that is listed as invalid object is the name I created for the table I want to create in SQL. Why is that invalid, what should it be renamed to?Banded
ORDER is an SQL keyword (as in "ORDER BY"), but I wouldn't normally expect orders to be a problem. In any case, try using [orders] in your SQL command text.Unhallowed
just gave that a shot, that didn't change anything, thank you regardless. Any other ideas?Banded
question, does the code I have create a table as well or does it only insert in a previously existing table?Banded
That form of INSERT statement will only insert into an existing table; it won't automatically create a new table.Unhallowed
so I added to my code an extra query: query1 = """CREATE TABLE orders (Lease_Number varchar(255), Start_Date varchar(255), Report_Status varchar(255), Status_Date varchar(255), Current_Status varchar(255), Sales_Rep varchar(255), Customer_Name varchar(255),Total_Finance varchar(255), Rate_Class varchar(255) ,Supplier_Name varchar(255) ,DecisionStatus varchar(255))""" and also added cursor.execute(query1) should that resolve my problem, I am getting a new error now:Banded
do you know if this error is because I do not have administrative privilges or because I am doing something wrong?Banded
It looks like you created a table called MK in schema LEAF, but you want to insert into table orders which is in schema dbo by default. Therefore, you are referencing two different tables. You may also have to commit the query after your CREATE TABLE statement or your subsequent INSERT queries won't find the table.Ore
@Scratch'N'Purr sorry, I forgot to update the other name, I just updated them both now. Orders was just a copy and paste from a tutorial I tried using, I actually decided to use "MK" for the table name and it is telling me that there is an object named "MK" already in the database despite my CREATE TABLE and INSERT INTO both being into [LEAF].[MK]Banded
@Parfait, I think its relevant because that table did not previously exist in the databaseBanded
@mkheifetz Ok, if your table already exists, then you don't need to create it again. Just make sure your column names match up with the column definitions when you created your table.Ore
@Scratch'N'Purr see the thing is it doesn't exist, I don't know why it is showing that it exists because it doesn't... Guess I should have specified that in my previous comment. I don't get why it is saying that it existsBanded
it seems like regardless of what I put in there it says the same thing, whether it's 'MK" or any other letters, it gives the same error messageBanded
any thoughts on this?Banded
You are insert into [LEAF].[MK] right? And you have already committed the CREATE TABLE transaction with conn.commit() right?Ore
for the later, I have database.commit()Banded
Use conn.commit() since that's the connector object that you created earlier. Also, don't put cursor.execute(query1) in your for loop because that's asking Python to create the table at each row. Put it before your for loop and then commit the table creation with conn.commit(). You might have to flush your journal first since there's probably a pending transaction for creating your table.Ore
Let us continue this discussion in chat.Banded
H
11

Based on the conversation we had in our chat, here are a few takeaways:

  1. After executing your CREATE TABLE query, make sure to commit it immediately before running any subsequent INSERT queries.
  2. Use error catching for cases when the table already exists in the database. You asked that if you wanted to import more data to the table, would the script still run. The answer is no, since Python will throw an exception at cursor.execute(query1).
  3. If you want to validate whether your insert operations were successful, you can do a simple record count check.

EDIT Yesterday, when I had @mkheifetz test my code out, he caught a minor bug where the validation check would return False, and the reason was because the database already had existing records, so when comparing against only the current data being imported, the validation would fail. Therefore, as a solution to address the bug, I have modified the code again.

Below is how I would modify your code:

import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt

import pandas.io.sql
import pyodbc

import xlrd
server = 'XXXXX'
db = 'XXXXXdb'

# create Connection and Cursor objects
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + db + ';Trusted_Connection=yes')
cursor = conn.cursor()

# read data
data = pd.read_excel('Flash Daily Apps through 070918.xls')

# rename columns
data = data.rename(columns={'Lease Number': 'Lease_Number',
                            'Start Date': 'Start_Date',
                            'Report Status': 'Report_Status',
                            'Status Date': 'Status_Date',
                            'Current Status': 'Current_Status',
                            'Sales Rep': 'Sales_Rep',
                            'Customer Name': 'Customer_Name',
                            'Total Financed': 'Total_Financed',
                            'Rate Class': 'Rate_Class',
                            'Supplier Name': 'Supplier_Name'})

# export
data.to_excel('Daily Flash.xlsx', index=False)

# Open the workbook and define the worksheet
book = xlrd.open_workbook("Daily Flash.xlsx")
sheet = book.sheet_by_name("Sheet1")

query1 = """
CREATE TABLE [LEAF].[ZZZ] (
    Lease_Number varchar(255),
    Start_Date varchar(255),
    Report_Status varchar(255),
    Status_Date varchar(255),
    Current_Status varchar(255),
    Sales_Rep varchar(255),
    Customer_Name varchar(255),
    Total_Finance varchar(255),
    Rate_Class varchar(255),
    Supplier_Name varchar(255),
    DecisionStatus varchar(255)
)"""

query = """
INSERT INTO [LEAF].[ZZZ] (
    Lease_Number,
    Start_Date,
    Report_Status,
    Status_Date,
    Current_Status,
    Sales_Rep,
    Customer_Name,
    Total_Finance,
    Rate_Class,
    Supplier_Name,
    DecisionStatus
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"""

# execute create table
try:
    cursor.execute(query1)
    conn.commit()
except pyodbc.ProgrammingError:
    pass

# grab existing row count in the database for validation later
cursor.execute("SELECT count(*) FROM LEAF.ZZZ")
before_import = cursor.fetchone()

for r in range(1, sheet.nrows):
    Lease_Number = sheet.cell(r,0).value
    Start_Date = sheet.cell(r,1).value
    Report_Status = sheet.cell(r,2).value
    Status_Date = sheet.cell(r,3).value
    Current_Status= sheet.cell(r,4).value
    Sales_Rep = sheet.cell(r,5).value
    Customer_Name = sheet.cell(r,6).value
    Total_Financed= sheet.cell(r,7).value
    Rate_Class = sheet.cell(r,8).value
    Supplier_Name = sheet.cell(r,9).value
    DecisionStatus= sheet.cell(r,10).value

    # Assign values from each row
    values = (Lease_Number, Start_Date, Report_Status, Status_Date, Current_Status,
              Sales_Rep, Customer_Name, Total_Financed, Rate_Class, Supplier_Name,
              DecisionStatus)

    # Execute sql Query
    cursor.execute(query, values)

# Commit the transaction
conn.commit()

# If you want to check if all rows are imported
cursor.execute("SELECT count(*) FROM LEAF.ZZZ")
result = cursor.fetchone()

print((result[0] - before_import[0]) == len(data.index))  # should be True

# Close the database connection
conn.close()
Homeopathic answered 10/7, 2018 at 20:16 Comment(15)
thank you very much for your help! You are an amazing asset on stackoverflow!Banded
@mkheifetz I just made some changes to my response to address the validation bug that you caught last night! It should return True now.Ore
thank you! Do you by any chance know, I have been trying to fix my 2 date columns where I changed them from just saying Start_Date to showing Start_Date datetime DEFAULT(getdate()) in the insert. The bizzare thing is when I check my SQL table, there are dates but they are off for some reason, and it seems like usually or always its by exactly by 2 days. For example, in my excel file it is 2018-07-09 00:00:00 in SQL it's 07-11 instead of 07-09, very strangeBanded
@mkheifetz That's interesting. It might be a difference in how Excel serializes numbers to dates and how MSSQL serializes numbers to dates. Here's a post on this. What you could try (and it will involve some major code changes) is to use SQLAlchemy and then use panda's to_sql method to dump the data to the db. The method will do the date abstractions for you.Ore
interesting, but the to_sql method is only applicable to MySQL, not SQL Server, do I understand that correctly? Does this code from this other stack overflow question: #48690182Banded
It will work with MSSQL, but you'll need the SQLAlchemy package to create your MSSQL engine. There's some info in their docs for creating one.Ore
question, is it sufficient to just do this with my dataframe and it will be in SQL: #43453920 engine = create_engine('sqlite:///cdb.db') Base.metadata.create_all(engine) file_name = 'client_db.csv' df = pandas.read_csv(file_name) df.to_sql(con=engine, index_label='id', name=cdb1.__tablename__, if_exists='replace')Banded
No, the example you provided uses a SQLite database as the backend. Since your database is MSSQL, your connection string should look like engine = create_engine("mssql+pyodbc://username:password@myhost:port/databasename?driver=SQL+Server+Native+Client+10.0"). Once you have that, then your export code would look like data.to_sql(con=engine, index=False, if_exists='append'), where data is the FIRST excel file that you read. You don't have to create an additional excel file. Make sure to change SQL+Server+Native+Client+10.0 to whichever SQL Server ODBC driver you have installed.Ore
sorry for the stupid question, how do I determine what goes inside the create_engine() arguments? How is it different from what goes into pyodbc.connect()?Banded
Well, if you look at the connection string, just replace the username, password, myhost, port, and databasename with your credentials, and the server info. myhost would be the server variable, and databasename would be your db variable. If the server doesn't need credentials, you can leave it out.Ore
Let us continue this discussion in chat.Banded
what would I put for port and is it possible not to have a password? When I login to the SQL database, there is no passwordBanded
If you dont have a password, you can leave it out. You should also leave out the user, e.g. mssql+pyodbc://@myhost:port/database?driver=SQL+Server. With the port, your MSSQL might be running on the default port of 1433, but if it's not the case, you'd have to ask the system admin who installed MSSQL.Ore
This is the code I have in there now: engine = create_engine("mssql+pyodbc://LEAFNOW\mkheifetz:@LCCPHL-PDSQL003/LEAFCore?driver=SQL+Server+Native+Client+10.0") When I run it I get a: NameError: name 'create_engine' is not definedBanded
Let's continue discussion in chatOre
B
0

I hope you can use the Pd.tosql to the extent of your request

import pandas as pd
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
from sqlalchemy import create_engine #-------Added new

import pandas.io.sql
import pyodbc

import xlrd
server = 'XXXXX'
db = 'XXXXXdb'
# read data
data = pd.read_excel('Flash Daily Apps through 070918.xls')

You can replace the column names with str.replace

data.columns = data.columns.str.replace(" ","_")

use the appropriate connectionString

conn_str = r'mssql+pymssql://{}/{}'.format(server,db)
engine = create_engine(conn_str)
data.to_sql('ZZZ', schema='dbo', con = engine, chunksize=50, method='multi', index=False, if_exists='replace')

Check the usage of .to_sql n Pandas documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

Berkman answered 15/3 at 19:41 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.