For your excel code, I like the pandas
solution someone came up with but if you are at work and can't install it, then I think you were almost there with the code approach you were taking. You have a loop traversing each sheet. So you can test the rows in each sheet and then take appropriate action if empty like so:
import xlrd
xlFile = "MostlyEmptyBook.xlsx"
def readfile(xlFile):
xls=xlrd.open_workbook(xlFile)
for sheet in xls.sheets():
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols
sheetname = sheet.name
header = sheet.row_values(0) #then If it contains only headers I want to treat as empty
if number_of_rows <= 1:
# sheet is empty or has just a header
# do what you want here
print(xlFile + "is empty.")
Note: I added a variable for the filename to make it easier to change in one place throughout the code when used. I also added :
to your function declaration which was missing it. If you want the test to be has header only (mine includes completely blank page), then change <=
to ==
.
Regarding the related csv issue. csv is just a text file. We can be reasonably certain a file is empty except for the header using a coding approach like the one that follows. I would try this code on a sample of files, and you may want to adjust my math logic. For example, it may be sufficient to use + 1
on the if comparison instead of *1.5
as I have it. My thinking is with white space or if a few characters were mistakenly included this would be a good file size cushion + the chars on second line test given in the coding logic.
This was written on the assumption you want to know if the file is empty before you load some giant file into your computer. If that assumption is wrong, you can use my test logic and then keep the file open, or even read in more code to make sure there isn't a blank line followed by additional content after the header (in a badly formatted input file):
import os
def convert_bytes(num):
"""
this function will convert bytes to MB.... GB... etc
"""
for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
if num < 1024.0:
return "%3.1f %s" % (num, x)
num /= 1024.0
def file_size(file_path):
"""
this function will return the file size
"""
if os.path.isfile(file_path):
file_info = os.stat(file_path)
return convert_bytes(file_info.st_size)
# testing if a csv file is empty in Python (header has bytes so not zero)
fileToTest = "almostEmptyCSV.csv"
def hasContentBeyondHeader(fileToTest):
answer = [ True, 0, 0, 0]
with open(fileToTest) as f:
lis = [ f.readline(), f.readline() ]
answer[1] = len(lis[0]) # length header row
answer[2] = len(lis[1]) # length of next row
answer[3] = file_size(fileToTest) # size of file
# these conditions should be high confidence file is empty or nearly so
sizeMult = 1.5 # test w/ your files and adjust as appropriate (but should work)
charLimit = 5
if answer[1] * sizeMult > answer[2] and answer[2] == 0:
answer[0] = False
elif answer[1] * sizeMult > answer[2] and answer[2] < charLimit:
# separate condition in case you want to remove it
# returns False if only a small number of chars (charLimit) on 2nd row
answer[0] = False
else:
answer[0] = True # added for readability (or delete else and keep default)
f.close()
return answer
hasContentBeyondHeader(fileToTest) # False if believed to be empty except for header
During testing, the readline commands extracted this content from the file:
['year,sex,births\n', '']
sample output:
[True, 16, 0, '17.0 bytes']
This approach means you can access the results of the test which are True/False in [0]
element of the list it returns. The additional elements allow you to get info on the inputs to the program's decision-making in case you want to tweak it later.
This code starts with a custom file size function. You can probably replace that with this depending on your preferences if you are looking for shorter code. This would replace the first two tiny functions:
import os
os.path.getsize(fullpathhere)