How to read only visible sheets from Excel using Pandas?
Asked Answered
E

7

10

I have to get some random Excel sheets where I want to read only visible sheets from those files.

Consider one file at a time, let's say I have Mapping_Doc.xls which contains 2-visible sheets and 2-hidden sheets.

As the sheets are less here, I can parse them with names like this:

Code :

xls = pd.ExcelFile('D:\\ExcelRead\\Mapping_Doc.xls')
print xls.sheet_names
df1 = xls.parse('Sheet1') #visible sheet
df2 = xls.parse('Sheet2') #visible sheet

Output:

[u'sheet1',u'sheet2',u'sheet3',u'sheet4']

How can I get only the visible sheets?

Esemplastic answered 16/6, 2017 at 5:56 Comment(3)
I don't think that this is possible with only Pandas, you might need other libraries.Hexachlorophene
@Hexachlorophene Any Specific libraries?Esemplastic
I think OpenPyXL does this.Hexachlorophene
A
8

Pandas uses the xlrd library internally (have a look at the excel.py source code if you're interested).

You can determine the visibility status by accessing each sheet's visibility attribute. According to the comments in the xlrd source code, these are the possible values:

  • 0 = visible
  • 1 = hidden (can be unhidden by user -- Format -> Sheet -> Unhide)
  • 2 = "very hidden" (can be unhidden only by VBA macro).

Here's an example that reads an Excel file with 2 worksheets, the first one visible and the second one hidden:

import pandas as pd

xls = pd.ExcelFile('test.xlsx')

sheets = xls.book.sheets()

for sheet in sheets:
    print(sheet.name, sheet.visibility)

Output:

Sheet1 0
Sheet2 1
Apposition answered 16/6, 2017 at 7:17 Comment(2)
It doesn't support '.xlsb' file . Is there any other options uing pandas?Gulley
error message: 'Workbook' object has no attribute 'sheets'Gaffer
E
5

Update for pandas 1.2.3

import pandas as pd

xls = pd.ExcelFile(filename)

sheets = xls.book.worksheets

for sheet in sheets:
    print(sheet.title, sheet.sheet_state)

Effulgence answered 2/12, 2021 at 13:46 Comment(1)
error message: 'ReadOnlyWorksheet' object has no attribute 'sheet_state'Gaffer
S
1

With Pandas 1.3.5, the response is different for the different engines:

  • Excel ≤2003 files are parsed with the xlrd engine.
  • Excel > 2003 files are parsed with the openpyxl engine.

This code works with both:

import pandas as pd
file = r'D:\path\to\File_to_Parse.xls'
excel = pd.ExcelFile(file)
if excel.engine == 'openpyxl':
    sheets = excel.book.worksheets
    for sheet in sheets:
        print(sheet.title, sheet.sheet_state) # .sheet_state returns a string
elif excel.engine == 'xlrd':
    sheets = excel.book.sheets()
    for sheet in sheets:
        print(sheet.name, sheet.visibility) # .visibility returns an integer: visible == 0, hidden > 0
else:
    print (f'{file} encoded with {excel.engine} engine. Unknown parsing.')
Sirocco answered 28/2, 2022 at 18:25 Comment(1)
sheet_state does not exist on the new 'ReadOnlyWorksheet' objects, and I can't find a way to force openpyxl to not lazy-load the spreadsheets through pandas' ExcelFile constructorFlour
P
1

I found reading the excel file with pandas returned different results depending on the excel file. One clear characteristic between the files was .xlsx versus .xls, but other times the Attributes were different and I couldn't find a pattern as to why.

So here's the try:except series that I came up with:

excel = pd.ExcelFile(file_path)
visible_sheet_names = []
if not visible_sheet_names:
    try:
        visible_sheet_names = [
            sheet.title
            for sheet in excel.book.worksheets
            if sheet.sheet_state == 'visible'
        ]
    except AttributeError:  # Some excel are missing the `worksheets` object. I think it's xlsx vs xls.
        pass
if not visible_sheet_names:
    try:
        visible_sheet_names = [
            sheet.name
            for sheet in excel.book.sheets()
            if sheet.visibility == 0
        ]
    except AttributeError:
        pass
if not visible_sheet_names:
    try:
        visible_sheet_names = [
            sheet
            for sheet, visible in zip(
                excel._sheet_names,
                excel._sheet_visibility,
            )
            if visible == 0
        ]
    except AttributeError:
        pass
assert visible_sheet_names, "All attempts to read sheet names failed"
Pompom answered 13/8 at 14:23 Comment(0)
W
0

Answer by @ƘɌỈSƬƠƑ, helped me too. I want to add few points.

For macro files (.xlsm), the visibility is unpredictable. May be because VBA code is behind the visibility setting. Even if a sheet is visible when the file is opened in Excel application, the visibility is not always 0 when read by xlrd.

Check the screenshots below:

This is what I see in Excel.

enter image description here

Visibility values fetched using xlrd

enter image description here

Waylonwayman answered 14/3, 2018 at 18:56 Comment(0)
M
0

I did a lot of R&D on Pandas, but couldn't find any solution. An alternate way is to use the xlrd library.

You need to install version 1.2.0 to get the support for xlsx excel format.

pip install xlrd==1.2.0

Read the excel and loop through the sheets. Inside the loop, you will get the sheet name and using the sheet name use the sheet_by_name method to get the sheet and on top of it use the visibility to find out if the sheet is visible or not.

import xlrd as xl
workbook = xl.open_workbook('File.xlsx')        
for sheet in workbook.sheets():
    isVisible = True if workbook.sheet_by_name(sheet.name).visibility == 0 else False
    if(isVisible == True):
        print(str(isVisible) + " : " + sheet.name)
    else:
        print(str(isVisible) + " : " + sheet.name)
Match answered 8/7, 2021 at 15:35 Comment(0)
P
0
import pandas as pd
import re

xls = pd.ExcelFile('Test.xlsx')
sheets =xls.book.sheets()

for sheet in sheets:
    sheetnames = (sheet.name) + '-'+ str(sheet.visibility)
    if re.search(r'.*\-[0]',sheetnames):
        targetsheet = sheetnames
        targetsheet = re.search(r'(.*)(\-[0])',targetsheet)
        targetsheet = targetsheet.group(1)
        print(targetsheet)
Pelson answered 16/12, 2022 at 10:41 Comment(1)
Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.Inhesion

© 2022 - 2024 — McMap. All rights reserved.