How can I open an Excel file in Python?
Asked Answered
P

8

114

How do I open a file that is an Excel file for reading in Python?

I've opened text files, for example, sometextfile.txt with the reading command. How do I do that for an Excel file?

Preen answered 13/7, 2010 at 16:26 Comment(1)
Which version of Excel? If you can limit yourself to opening Excel files created by Ecel 2007 or 2010, you should be able to parse much or all of the file as XML.Geognosy
W
120

Edit:
In the newer version of pandas, you can pass the sheet name as a parameter.

file_name =  # path to file + file name
sheet =  # sheet name or sheet number or list of sheet numbers and names

import pandas as pd
df = pd.read_excel(io=file_name, sheet_name=sheet)
print(df.head(5))  # print first 5 rows of the dataframe

Check the docs for examples on how to pass sheet_name:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_excel.html

Old version:
you can use pandas package as well....

When you are working with an excel file with multiple sheets, you can use:

import pandas as pd
xl = pd.ExcelFile(path + filename)
xl.sheet_names

>>> [u'Sheet1', u'Sheet2', u'Sheet3']

df = xl.parse("Sheet1")
df.head()

df.head() will print first 5 rows of your Excel file

If you're working with an Excel file with a single sheet, you can simply use:

import pandas as pd
df = pd.read_excel(path + filename)
print df.head()
Wroth answered 25/6, 2013 at 7:16 Comment(2)
this solution gets my upvote. with openpyxl, I'm running into the following problem "InvalidFileException: openpyxl does not support the old .xls file format, please use xlrd to read this file, or convert it to the more recent .xlsx file format". On the other hand pandas handles both .xls and .xlsx files... also, reading an entire table takes just one line of code.Croton
You will need to install optional dependencies xlrd for reading Excel files, and xlwt for writing Excel files.Chunk
A
36

Try the xlrd library.

[Edit] - from what I can see from your comment, something like the snippet below might do the trick. I'm assuming here that you're just searching one column for the word 'john', but you could add more or make this into a more generic function.

from xlrd import open_workbook

book = open_workbook('simple.xls',on_demand=True)
for name in book.sheet_names():
    if name.endswith('2'):
        sheet = book.sheet_by_name(name)

        # Attempt to find a matching row (search the first column for 'john')
        rowIndex = -1
        for cell in sheet.col(0): # 
            if 'john' in cell.value:
                break

        # If we found the row, print it
        if row != -1:
            cells = sheet.row(row)
            for cell in cells:
                print cell.value

        book.unload_sheet(name) 
Aulos answered 13/7, 2010 at 16:29 Comment(3)
I think this might be what I want it to do : from xlrd import open_workbook book = open_workbook('simple.xls',on_demand=True) for name in book.sheet_names(): if name.endswith('2'): sheet = book.sheet_by_name(name) print sheet.cell_value(0,0) book.unload_sheet(name) large_files.py but I dont want it to use endwith i want it to find and print lines that contain a particlar name...like i want it to print the line of the huge excel sheet that contains john's data and not bob's. help?Preen
I'd suggest you post this as a seperate question and put the code in a code block.Aulos
This is the second question of a series of related questions; in the 3rd question it is revealed that the real excel file is allegedly 1.5 GB and the computer's memory is described as "not enough" ... see #3241539Bottom
P
18

This isn't as straightforward as opening a plain text file and will require some sort of external module since nothing is built-in to do this. Here are some options:

http://www.python-excel.org/

If possible, you may want to consider exporting the excel spreadsheet as a CSV file and then using the built-in python csv module to read it:

http://docs.python.org/library/csv.html

Pagandom answered 13/7, 2010 at 16:29 Comment(4)
Ok I don't really understand the CSV stuff how do I have python open up my excel file as a csv module? I have a program that does what I want for txt files and I want it to do the same thing for this excel file...which is the best way to go? Can you elaborate on this please?Preen
Either you can use a 3rd party python module like xlrd, or save your excel file a CSV file, instead of a normal Excel file. I think the point you are missing is that an excel file has no resemblance to a plain text file. Open the Excel document in notepad and you will see what I mean. You either need to save the file in a plain-text format such as CSV (comma-separated values), which is easier to read with python, or install and use a 3rd party module that can parse an Excel file for you.Pagandom
The problem I'm having is the file is really really large. How can I save the file as a CSV format if I cannot completely open the file?Preen
@novak: Your problem is that your file is 1.5GB and your computer's memory is "not enough" ...Bottom
D
10

There's the openpxyl package:

>>> from openpyxl import load_workbook
>>> wb2 = load_workbook('test.xlsx')
>>> print wb2.get_sheet_names()
['Sheet2', 'New Title', 'Sheet1']

>>> worksheet1 = wb2['Sheet1'] # one way to load a worksheet
>>> worksheet2 = wb2.get_sheet_by_name('Sheet2') # another way to load a worksheet
>>> print(worksheet1['D18'].value)
3
>>> for row in worksheet1.iter_rows():
>>>     print row[0].value()
Drawee answered 22/6, 2015 at 18:57 Comment(0)
C
2

You can use xlpython package that requires xlrd only. Find it here https://pypi.python.org/pypi/xlpython and its documentation here https://github.com/morfat/xlpython

Carolynecarolynn answered 13/5, 2015 at 12:3 Comment(0)
M
1

This may help:

This creates a node that takes a 2D List (list of list items) and pushes them into the excel spreadsheet. make sure the IN[]s are present or will throw and exception.

this is a re-write of the Revit excel dynamo node for excel 2013 as the default prepackaged node kept breaking. I also have a similar read node. The excel syntax in Python is touchy.

thnx @CodingNinja - updated : )

###Export Excel - intended to replace malfunctioning excel node

import clr

clr.AddReferenceByName('Microsoft.Office.Interop.Excel, Version=15.0.0.0, Culture=neutral, PublicKeyToken=71e9bce111e9429c')
##AddReferenceGUID("{00020813-0000-0000-C000-000000000046}") ''Excel                            C:\Program Files\Microsoft Office\Office15\EXCEL.EXE 
##Need to Verify interop for version 2015 is 15 and node attachemnt for it.
from Microsoft.Office.Interop import  * ##Excel
################################Initialize FP and Sheet ID
##Same functionality as the excel node
strFileName = IN[0]             ##Filename
sheetName = IN[1]               ##Sheet
RowOffset= IN[2]                ##RowOffset
ColOffset= IN[3]                ##COL OFfset
Data=IN[4]                      ##Data
Overwrite=IN[5]                 ##Check for auto-overwtite
XLVisible = False   #IN[6]      ##XL Visible for operation or not?

RowOffset=0
if IN[2]>0:
    RowOffset=IN[2]             ##RowOffset

ColOffset=0
if IN[3]>0:
    ColOffset=IN[3]             ##COL OFfset

if IN[6]<>False:
    XLVisible = True #IN[6]     ##XL Visible for operation or not?

################################Initialize FP and Sheet ID
xlCellTypeLastCell = 11                 #####define special sells value constant
################################
xls = Excel.ApplicationClass()          ####Connect with application
xls.Visible = XLVisible                 ##VISIBLE YES/NO
xls.DisplayAlerts = False               ### ALerts

import os.path

if os.path.isfile(strFileName):
    wb = xls.Workbooks.Open(strFileName, False)     ####Open the file 
else:
    wb = xls.Workbooks.add#         ####Open the file 
    wb.SaveAs(strFileName)
wb.application.visible = XLVisible      ####Show Excel
try:
    ws = wb.Worksheets(sheetName)       ####Get the sheet in the WB base

except:
    ws = wb.sheets.add()                ####If it doesn't exist- add it. use () for object method
    ws.Name = sheetName



#################################
#lastRow for iterating rows
lastRow=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Row
#lastCol for iterating columns
lastCol=ws.UsedRange.SpecialCells(xlCellTypeLastCell).Column
#######################################################################
out=[]                                  ###MESSAGE GATHERING

c=0
r=0
val=""
if Overwrite == False :                 ####Look ahead for non-empty cells to throw error
    for r, row in enumerate(Data):   ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
        for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
            if col.Value2 >"" :
                OUT= "ERROR- Cannot overwrite"
                raise ValueError("ERROR- Cannot overwrite")
##out.append(Data[0]) ##append mesage for error
############################################################################

for r, row in enumerate(Data):   ####BASE 0## EACH ROW OF DATA ENUMERATED in the 2D array #range( RowOffset, lastRow + RowOffset):
    for c, col in enumerate (row): ####BASE 0## Each colmn in each row is a cell with data ### in range(ColOffset, lastCol + ColOffset):
        ws.Cells[r+1+RowOffset,c+1+ColOffset].Value2 = col.__str__()

##run macro disbled for debugging excel macro
##xls.Application.Run("Align_data_and_Highlight_Issues")
Minter answered 29/4, 2018 at 20:3 Comment(2)
@CodingNinja That sufficient? : )Minter
Yes much betterBrew
J
-1
import pandas as pd 
import os 
files = os.listdir('path/to/files/directory/')
desiredFile = files[i]
filePath = 'path/to/files/directory/%s'
Ofile = filePath % desiredFile
xls_import = pd.read_csv(Ofile)

Now you can use the power of pandas DataFrames!

Jd answered 2/12, 2015 at 18:31 Comment(1)
The question is about reading an Excel file, not a comma-separated text file. Pandas does seem to have a function for that (pandas.read_excel).Cerebral
R
-2

This code worked for me with Python 3.5.2. It opens and saves and excel. I am currently working on how to save data into the file but this is the code:

import csv
excel = csv.writer(open("file1.csv", "wb"))

 

Roadwork answered 3/11, 2017 at 21:33 Comment(1)
You are opening a comma seperated file and not an Excel file, which the question is about.Carousel

© 2022 - 2024 — McMap. All rights reserved.