Read MS Word table data row wise using win32:ole perl
Asked Answered
N

3

6

I am new to win32:ole module in perl. I am trying to print MS word table data row wise on command prompt. But I am able to print only last row of the table. Can you please help me to solve this problem? Thanks in advance.

Below is my code:

#!/usr/bin/perl 
use strict;
use warnings; 
use File::Spec::Functions qw( catfile );
use Win32::OLE qw(in); 
use Win32::OLE::Const 'Microsoft Word'; 
$Win32::OLE::Warn = 3; 

my $word = get_word(); 
$word->{DisplayAlerts} = wdAlertsNone; 
$word->{Visible} = 1;
my $doc = $word->{Documents}->Open('C:\\PerlScripts\\myTest.docx');
my $tables = $word->ActiveDocument->{'Tables'};

for my $table (in $tables)
{
   my $tableText = $table->ConvertToText({ Separator => wdSeparateByTabs });
   print "Table: ". $tableText->Text(). "\n";
}

$doc->Close(0); 

sub get_word 
{ 
    my $word; 
    eval { $word = Win32::OLE->GetActiveObject('Word.Application');}; 
    die "$@\n" if $@;
    unless(defined $word) 
    { 
       $word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit }) 
       or die "Oops, cannot start Word: ", Win32::OLE->LastError, "\n"; 
    } 
  return $word; 
} 
Neckar answered 1/11, 2012 at 21:34 Comment(6)
Probably a copy and paste error and not necessarily the issue but the use strict should be on the second line.Search
This isn't your problem, but -- I don't think there's any point in using eval { ... } when you're just going to die "$@\n" if $@ afterward anyway. Is there?Cauldron
I tried to write the content to a file name "mytest.txt", there all the table data is getting printed but not row.. its getting printed as if all the contect of object is dumped in "mytest.txt" fileNeckar
@RobKielty: Y'know, that actually could be the issue. Since the use strict isn't in effect, it's possible that the wdSeparateByTabs is being treated as the string 'wdSeparateByTabs', and similarly for other constants. (I mean, obviously the real issue in that case would be that use Win32::OLE::Const 'Microsoft Word' is not loading a constant that the OP expects it to; but the lack of use strict could conceal that issue.)Cauldron
I wrote use strict; on separate line, but I am still facing that issue!Neckar
I have reproduced the problem but am struggling to figure out OLE via Perl. Can you use VBA for this task?Search
S
2

Not a perfect solution by any means but here's an advancement on the problem.

I used a string separator "\n\n" which produces the following output ...

Further hacking required :(

C:\StackOverflow>perl  word.pl meTest.docx
Table: Header1
Header2
Header3
Header4
Row1-Cell1
Row1-Cell2
Row1-Cell3
Row1-Cell4
Row2-Cell1
Row2-Cell2
Row2-Cell3
Row2-Cell4
Row2-Cell5

Here's the code. I have commented out some other code in the tables loop that I used to hack on the data returned by $tableRange->{Text} Uncomment to experiment further.

#!/usr/bin/perl 
use strict;
use warnings; 
use File::Spec::Functions qw( catfile );
use Win32::OLE qw(in); 
use Win32::OLE::Const 'Microsoft Word'; 
$Win32::OLE::Warn = 3; 

my $word = get_word(); 
$word->{DisplayAlerts} = wdAlertsNone; 
$word->{Visible} = 1;
my $doc = $word->{Documents}->Open('meTest.docx');
my $tables = $word->ActiveDocument->{'Tables'};

for my $table (in $tables)
{
   my $tableRange = $table->ConvertToText({ Separator => "\n\n" });
   print "Table: \n" . $tableRange->{Text}. "\n";
   # foreach $word (split/\n/, $tableRange->{Text}) {
   #  print $word . "\n" ;
   #  # $userinput =  <STDIN>;
   # }
}

$doc->Close(0); 

sub get_word 
{ 
    my $word; 
    eval { $word = Win32::OLE->GetActiveObject('Word.Application');}; 
    die "$@\n" if $@;
    unless(defined $word) 
    { 
       $word = Win32::OLE->new('Word.Application', sub { $_[0]->Quit }) 
       or die "Oops, cannot start Word: ", Win32::OLE->LastError, "\n"; 
    } 
  return $word; 
} 

Sorry I couldn't be of more help.

Search answered 2/11, 2012 at 0:36 Comment(0)
A
2

extract all the doc tables into a single xls file

     sub doParseDoc {

           my $msg     = '' ; 
           my $ret     = 1 ; # assume failure at the beginning ...

           $msg        = 'START --- doParseDoc' ; 
           $objLogger->LogDebugMsg( $msg );
           $msg        = 'using the following DocFile: "' . $DocFile . '"' ; 
           $objLogger->LogInfoMsg( $msg );
           #-----------------------------------------------------------------------
           #Using OLE + OLE constants for Variants and OLE enumeration for Enumerations


           # Create a new Excel workbook
           my $objWorkBook = Spreadsheet::WriteExcel->new("$DocFile" . '.xls');

           # Add a worksheet
           my $objWorkSheet = $objWorkBook->add_worksheet();


           my $var1 = Win32::OLE::Variant->new(VT_BOOL, 'true');

           Win32::OLE->Option(Warn => \&Carp::croak);
           use constant true => 0;

           # at this point you should have the Word application opened in UI with t
           # the DocFile
           # build the MS Word object during run-time 
           my $objMSWord = Win32::OLE->GetActiveObject('Word.Application')
                             or Win32::OLE->new('Word.Application', 'Quit');  

           # build the doc object during run-time 
           my $objDoc   = $objMSWord->Documents->Open($DocFile)
                 or die "Could not open ", $DocFile, " Error:", Win32::OLE->LastError();

           #Set the screen to Visible, so that you can see what is going on
           $objMSWord->{'Visible'} = 1;
           # try NOT printing directly to the file


            #$objMSWord->ActiveDocument->SaveAs({Filename => 'AlteredTest.docx', 
                                        #FileFormat => wdFormatDocument});

           my $tables        = $objMSWord->ActiveDocument->Tables();
           my $tableText     = '' ;   
           my $xlsRow        = 1 ; 

           for my $table (in $tables){
              # extract the table text as a single string
              #$tableText = $table->ConvertToText({ Separator => 'wdSeparateByTabs' });
              # cheated those properties from here: 
              # https://msdn.microsoft.com/en-us/library/aa537149(v=office.11).aspx#officewordautomatingtablesdata_populateatablewithdata
              my $RowsCount = $table->{'Rows'}->{'Count'} ; 
              my $ColsCount = $table->{'Columns'}->{'Count'} ; 

              # disgard the tables having different than 5 columns count
              next unless ( $ColsCount == 5 ) ;

              $msg           = "Rows Count: $RowsCount " ; 
              $msg           .= "Cols Count: $ColsCount " ; 
              $objLogger->LogDebugMsg ( $msg ) ; 

              #my $tableRange = $table->ConvertToText({ Separator => '##' });
              # OBS !!! simple print WILL print to your doc file use Select ?!
              #$objLogger->LogDebugMsg ( $tableRange . "\n" );
              # skip the header row
              foreach my $row ( 0..$RowsCount ) {
                 foreach my $col (0..$ColsCount) {

                    # nope ... $table->cell($row,$col)->->{'WrapText'} = 1 ; 
                    # nope $table->cell($row,$col)->{'WordWrap'} = 1  ;
                    # so so $table->cell($row,$col)->WordWrap() ; 

                    my $txt = ''; 
                    # well some 1% of the values are so nasty that we really give up on them ... 
                    eval {
                       $txt = $table->cell($row,$col)->range->{'Text'}; 
                       #replace all the ctrl chars by space
                       $txt =~ s/\r/ /g   ; 
                       $txt =~ s/[^\040-\176]/ /g  ; 
                       # perform some cleansing - ColName<primary key>=> ColName
                       #$txt =~ s#^(.[a-zA-Z_0-9]*)(\<.*)#$1#g ; 

                       # this will most probably brake your cmd ... 
                       # $objLogger->LogDebugMsg ( "row: $row , col: $col with txt: $txt \n" ) ; 
                    } or $txt = 'N/A' ; 

                    # Write a formatted and unformatted string, row and column notation.
                    $objWorkSheet->write($xlsRow, $col, $txt);

                 } #eof foreach col

                 # we just want to dump all the tables into the one sheet
                 $xlsRow++ ; 
               } #eof foreach row
               sleep 1 ; 
           }  #eof foreach table

           # close the opened in the UI document
           $objMSWord->ActiveDocument->Close;

           # OBS !!! now we are able to print 
           $objLogger->LogDebugMsg ( $tableText . "\n" );

           # exit the whole Word application
           $objMSWord->Quit;

           return ( $ret , $msg ) ; 
     }
     #eof sub doParseDoc
Aeolic answered 11/2, 2015 at 14:7 Comment(0)
H
0

Use below lines of code

my $doc = $word->Documents->Open('C:\\PerlScripts\\myTest.docx');
my $tables = $word->{'Tables'};

instead of below code

my $doc = $word->{Documents}->Open('C:\\PerlScripts\\myTest.docx');
my $tables = $word->ActiveDocument->{'Tables'};

your problem get solved.

Hobson answered 2/5, 2013 at 7:24 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.