Using Spreadsheet::ParseExcel in Perl, but need help
Asked Answered
C

3

6

I have a Perl program using Spreadsheet::ParseExcel. However, there are two difficulties that have arisen that I have been unable to figure out how to solve. The script for the program is as follows:

#!/usr/bin/perl
use strict;
use warnings;
use Spreadsheet::ParseExcel;
use WordNet::Similarity::lesk;
use WordNet::QueryData;

my $wn = WordNet::QueryData->new();
my $lesk = WordNet::Similarity::lesk->new($wn);
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse ( 'input.xls' );

if ( !defined $workbook ) {
   die $parser->error(), ".\n";
}

WORKSHEET:
for my $worksheet ( $workbook->worksheets() ) {

    my $sheetname = $worksheet->get_name();
    my ( $row_min, $row_max ) = $worksheet->row_range();
    my ( $col_min, $col_max ) = $worksheet->col_range();
    my $target_col;
    my $response_col;

# Skip worksheet if it doesn't contain data
    if ( $row_min > $row_max ) {
       warn "\tWorksheet $sheetname doesn't contain data. \n";
       next WORKSHEET;
    }

# Check for column headers
    COLUMN:
    for my $col ( $col_min .. $col_max ) {

        my $cell = $worksheet->get_cell( $row_min, $col );
        next COLUMN unless $cell;

        $target_col   = $col if $cell->value() eq 'Target';
        $response_col = $col if $cell->value() eq 'Response';
    }

    if ( defined $target_col && defined $response_col ) {

        ROW:
        for my $row ( $row_min + 1 .. $row_max ) {
            my $target_cell   = $worksheet->get_cell( $row, $target_col);
            my $response_cell = $worksheet->get_cell( $row, $response_col);
            if ( defined $target_cell && defined $response_cell ) {
                my $target   = $target_cell->value();
                my $response = $response_cell->value();

                my $value    = $lesk->getRelatedness( $target, $response );

                print "Worksheet   = $sheetname\n";
                print "Row         = $row\n";
                print "Target      = $target\n";
                print "Response    = $response\n";
                print "Relatedness = $value\n";                

            }
            else {

                warn "\tWroksheet $sheetname, Row = $row doesn't contain target and response data.\n";
                next ROW;
            }
        }    
    }
    else {

        warn "\tWorksheet $sheetname: Didn't find Target and Response headings.\n";
        next WORKSHEET;
    }  
}

So, my two problems:

First of all, sometimes the program returns the error "No Excel data found in file," even though the data is there. Each Excel file is formatted the same way. There is only one sheet, with the A and B columns labelled 'Target' and 'Response,' respectively, with a list of words beneath them. However, it does not ALWAYS return this error. It works for one Excel file, but it does not work for a different one, even though both are formatted the exact same way (and yes, they are both the same file type, as well). I cannot find any reason for it to not read the second file, because it is identical to the first. The only difference is that the second file was created using an Excel macro; however, why would that matter? The file types and format are exactly the same.

Second, the variables '$target' and '$response' need to be formatted as strings in order for the 'my $value' expression to work. How do I convert them into string format? The value assigned to each variable is a word from the appropriate cell of the Excel spreadsheet. I don't know what format that is (and there is no apparent way in Perl for me to check).

Any suggestions?

Chkalov answered 27/7, 2011 at 15:1 Comment(1)
Nice to see some well-formatted code for a change!Tag
M
7

In relation to your first question, the "no data found" error indicates some problem with the file format. I've seen this error with pseudo-Excel files such as Html or CSV files that have an xls extension. I've also seen this error with mal-formed files generated by third party apps.

You could do an initial verification of the files by doing a hexdump/xxd dump of a working and non working file and seeing if the overall structure is approximately the same (for example if it has similar magic numbers at the start and isn't Html).

It could also be an issue with Spreadsheet::ParseExcel. I am the maintainer of that module. If you like you could send me on a "good" and "bad" file, at the email address in the docs, and I will have a look at them.

Mannerly answered 27/7, 2011 at 17:5 Comment(2)
I don't know enough about hexdumps to get anything out of that. I'll probably end up switching to using .txt files or something unless I can figure out why it doesn't like the file. In any case, thanks a lot, I just sent you an e-mail.Chkalov
Just to give an update on this. The problematic files where actually Excel 2007 xlsx files. They need to be read with Spreadsheet::XLSX or Spreadsheet::Read. Spreadsheet::ParseExcel can only handle the pre 2007 file format.Mannerly
K
1

First of all, if you are getting "no data found" you can thank proprietary Excel data file formats and the inability of even a good Perl library to extract information from them.

I strongly suggest that you export the Excel data in something easily parsed like CSV especially given the simple nature of the data layout you described. There may be a way to get Excel to process a batch but I have no idea. A quick search yielded a tool to use OpenOffice to do batch conversion.

The rest of your question is rather moot once you accept that Excel data files will not play nicely.

Kenya answered 27/7, 2011 at 15:14 Comment(0)
M
0

I wrote this code after a client couldn't decide whether the XLS he was sending every week was really in XLS format or just CSV.... HTH!

sub testForXLS ()
{
my ( $FileName )    = @_;
my $signature       = '';
my $XLSsignature    = 'D0CF11E0A1B11AE10000';

open(FILE, "<$FileName")||die;
read(FILE, $buffer, 10, 0);
close(FILE);

foreach (split(//, $buffer))
    { $signature .= sprintf("%02x", ord($_)); }

$signature =~ tr/a-z/A-Z/;

if ( $signature eq $XLSsignature )
{ return 1; } else { return 0; }

}
Michiko answered 29/7, 2011 at 15:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.