How to check if .xls and .csv files are empty
Asked Answered
S

7

14

Question 1: How can I check if an entire .xls or .csv file is empty.This is the code I am using:

try:
    if os.stat(fullpath).st_size > 0:
       readfile(fullpath)
    else:
       print "empty file"
except OSError:
    print "No file"

An empty .xls file has size greater than 5.6kb so it is not obvious whether it has any contents. How can I check if an xls or csv file is empty?

Question 2: I need to check the header of the file. How can I tell python that files which are just a single row of headers are empty?

import xlrd
def readfile(fullpath)
    xls=xlrd.open_workbook(fullpath)  
    for sheet in xls.sheets():
        number_of_rows = sheet.nrows 
        number_of_columns = sheet.ncols
        sheetname = sheet.name
        header = sheet.row_values(0) #Then if it contains only headers, treat it as empty.

This is my attempt. How do I continue with this code?

Please provide a solution for both questions. Thanks in advance.

Sybille answered 1/3, 2017 at 16:37 Comment(0)
P
19

This is simple in pandas with the .empty method. Do this

import pandas as pd

df = pd.read_csv(filename) # or pd.read_excel(filename) for xls file
df.empty # will return True if the dataframe is empty or False if not.

This will also return True for a file with only headers as in

>> df = pd.DataFrame(columns = ['A','B'])
>> df.empty
   True
Preheat answered 1/3, 2017 at 16:42 Comment(2)
thanks for your answer.and I am using xlrd and I dont want install any other packages like pandaSybille
For my empty csv, pd.read_csv(filename) just throw an error: pandas.errors.EmptyDataError: No columns to parse from file ...Confident
S
5

Question 1: How I check the entire .xls file are empty.

def readfile(fullpath):

    xls = xlrd.open_workbook(fullpath)

    is_empty = None

    for sheet in xls.sheets():
        number_of_rows = sheet.nrows

        if number_of_rows == 1:
            header = sheet.row_values(0)  
            # then If it contains only headers I want to treat as empty
            if header:
                is_empty = False
                break

        if number_of_rows > 1:
            is_empty = False
            break

        number_of_columns = sheet.ncols
        sheetname = sheet.name

    if is_empty:
        print('xlsx ist empty')

Question 2: How I check header of the file .If the file has only a header(I mean only a single row) I need to treat the file is empty .How can I do that.

import csv
with open('test/empty.csv', 'r') as csvfile:
    csv_dict = [row for row in csv.DictReader(csvfile)]
    if len(csv_dict) == 0:
        print('csv file is empty')

Tested with Python:3.4.2

Stoma answered 13/3, 2017 at 21:58 Comment(2)
may be your answer is right but i need to check csv ans xls alsoSybille
For csv, you don't need iterate through all row nor parse with DictReader. You can just check if the second line in a file is empty. f.readline() == b''. Check my answer for full example.Flotilla
H
1

I don't think Stackoverflow allows 2 question at the time but let me give you my answer for the Excel part

import xlrd
from pprint import pprint

wb = xlrd.open_workbook("temp.xlsx")

empty_sheets = [sheet for sheet in wb.sheets() if sheet.ncols == 0]
non_empty_sheets = [sheet for sheet in wb.sheets() if sheet.ncols > 0]

# printing names of empty sheets
pprint([sheet.name for sheet in empty_sheets])

# writing non empty sheets to database 
pass # write code yourself or ask another question 

About the header: let me give you a little hint, test for sheet.nrows == 1.

Hutson answered 1/3, 2017 at 16:54 Comment(8)
u said for checking all the sheets at same time .but if sheet 1 has data and sheet2 is empty .then what can i do.Sybille
@bobmarti what do you mean? We do not know what you want to do! Do you only want to have the sheets which are not empty?Hutson
I want check all the sheets and empty sheets treat as empty and sheets value stored into dbSybille
@bobmarti I really do not understand what you mean.Hutson
I want to check all the sheets in excel file.then i need which sheets are empty then print it is empty.if any of the sheet is not empty then save to the database\Sybille
@bobmarti I updated the code according to your updated question. Tell me if it works or if you have more questions.Hutson
just look up my second question.give me a solution based on thatSybille
because your solution gives me an idea.how can implement into my question.here you use sheet for sheet in wb.sheets() two times.but I used for sheet in xls.sheets(): only once.then how to continue with my codeSybille
B
1

For your excel code, I like the pandas solution someone came up with but if you are at work and can't install it, then I think you were almost there with the code approach you were taking. You have a loop traversing each sheet. So you can test the rows in each sheet and then take appropriate action if empty like so:

import xlrd

xlFile = "MostlyEmptyBook.xlsx"

def readfile(xlFile):
    xls=xlrd.open_workbook(xlFile)  
    for sheet in xls.sheets():
        number_of_rows = sheet.nrows 
        number_of_columns = sheet.ncols
        sheetname = sheet.name
        header = sheet.row_values(0) #then If it contains only headers I want to treat as empty
        if number_of_rows <= 1:
            # sheet is empty or has just a header
            # do what you want here
            print(xlFile + "is empty.")

Note: I added a variable for the filename to make it easier to change in one place throughout the code when used. I also added : to your function declaration which was missing it. If you want the test to be has header only (mine includes completely blank page), then change <= to ==.

Regarding the related csv issue. csv is just a text file. We can be reasonably certain a file is empty except for the header using a coding approach like the one that follows. I would try this code on a sample of files, and you may want to adjust my math logic. For example, it may be sufficient to use + 1 on the if comparison instead of *1.5 as I have it. My thinking is with white space or if a few characters were mistakenly included this would be a good file size cushion + the chars on second line test given in the coding logic.

This was written on the assumption you want to know if the file is empty before you load some giant file into your computer. If that assumption is wrong, you can use my test logic and then keep the file open, or even read in more code to make sure there isn't a blank line followed by additional content after the header (in a badly formatted input file):

import os

def convert_bytes(num):
    """
    this function will convert bytes to MB.... GB... etc
    """
    for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if num < 1024.0:
            return "%3.1f %s" % (num, x)
        num /= 1024.0


def file_size(file_path):
    """
    this function will return the file size
    """
    if os.path.isfile(file_path):
        file_info = os.stat(file_path)
        return convert_bytes(file_info.st_size)


# testing if a csv file is empty in Python (header has bytes so not zero)

fileToTest = "almostEmptyCSV.csv"

def hasContentBeyondHeader(fileToTest):
    answer = [ True, 0, 0, 0]
    with open(fileToTest) as f:
        lis = [ f.readline(), f.readline() ] 
        answer[1] = len(lis[0])                # length header row
        answer[2] = len(lis[1])                # length of next row
        answer[3] = file_size(fileToTest)      # size of file

        # these conditions should be high confidence file is empty or nearly so
        sizeMult = 1.5   # test w/ your files and adjust as appropriate (but should work)
        charLimit = 5

        if answer[1] * sizeMult > answer[2] and answer[2] == 0:
            answer[0] = False
        elif answer[1] * sizeMult > answer[2] and answer[2] < charLimit:
            # separate condition in case you want to remove it
            # returns False if only a small number of chars (charLimit) on 2nd row
            answer[0] = False
        else:
            answer[0] = True   # added for readability (or delete else and keep default)         

        f.close()
    return answer

hasContentBeyondHeader(fileToTest)  # False if believed to be empty except for header

During testing, the readline commands extracted this content from the file:

['year,sex,births\n', '']

sample output:

[True, 16, 0, '17.0 bytes']

This approach means you can access the results of the test which are True/False in [0] element of the list it returns. The additional elements allow you to get info on the inputs to the program's decision-making in case you want to tweak it later.

This code starts with a custom file size function. You can probably replace that with this depending on your preferences if you are looking for shorter code. This would replace the first two tiny functions:

import os    
os.path.getsize(fullpathhere)
Birch answered 15/3, 2017 at 21:22 Comment(0)
F
1

what about something like this :

file = open(path, "r")
file_content = file.read()
file.close()
if file_content == "":
    print("File '{}' is empty".format(path))
else:
    rows = file_content.split("\n", 1)
    if rows[1] == "":
        print("File '{}' contains headers only.".format(path))

where path is the path of your xls or csv file.

Forester answered 16/3, 2017 at 18:41 Comment(1)
Aftermind, I am not sure if this code would work for xls files, because of the special encoding of this file format...Forester
F
1

For your question:

Question 2: I need to check the header of the file. How can I tell python that files which are just a single row of headers are empty?

You can just check the line in files.

with open('empty_csv_with_header.csv') as f:
    f.readline()  # skip header
    line = f.readline()
    if line == b'':
        print('Empty csv')
Flotilla answered 6/12, 2017 at 12:28 Comment(0)
N
0

Try this solves the problem. Not all empty CSV files are 0 bytes. This works in that condition also.

import os
from os.path import isfile
import pandas

if isfile(PASSWORD_MANAGER_FILE):
    try:
        pandas.read_csv(PASSWORD_MANAGER_FILE)
    except pandas.errors.EmptyDataError:
        os.remove(PASSWORD_MANAGER_FILE)
Nineteenth answered 28/4, 2021 at 6:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.