Read MS Word .doc file with ruby and win32ole
Asked Answered
J

2

1

I'm trying to read .doc file with ruby, I use win32ole library.

IT my code:


require 'win32ole'

class DocParser

  def initialize
    @content = ''
  end

  def read_file file_path
    begin
      word = WIN32OLE.connect( 'Word.Application' )
      doc  = word.activedocument
    rescue
      word = WIN32OLE.new( 'Word.Application' )
      doc  = word.documents.open( file_path )
    end
    word.visible = false
    doc.sentences.each{ |x| @content = @content + x.text }

    word.quit
    @content
  end
end

I kick off doc reading with DocParser.new.read_file('path/file.doc')

When I run this using rails c - I don't have any problems, it's working fine. But when I run it using rails (e.g. after button click), once in a while (every 3-4 time) this code crashes with error:


WIN32OLERuntimeError (failed to create WIN32OLE object from `Word.Application'
    HRESULT error code:0x800401f0
      CoInitialize has not been called.):
  lib/file_parsers/doc_parser.rb:14:in `initialize'
  lib/file_parsers/doc_parser.rb:14:in `new'
  lib/file_parsers/doc_parser.rb:14:in `rescue in read_file'
  lib/file_parsers/doc_parser.rb:10:in `read_file'
  lib/search_engine.rb:10:in `block in search'
  lib/search_engine.rb:43:in `block in each_file_in'
  lib/search_engine.rb:42:in `each_file_in'
  lib/search_engine.rb:8:in `search'
  app/controllers/home_controller.rb:9:in `search'


  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/_source.erb (0.0ms)
  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/_trace.text.erb (2.0ms)
  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/_request_and_response.text.erb (2.0ms)
  Rendered c:/Ruby193/lib/ruby/gems/1.9.1/gems/actionpack-4.1.1/lib/action_dispatch/middleware/templates/rescues/diagnostics.erb (56.0ms)

Aditionaly, this code read doc file successfully, but RAILS CRASHES AFTER A FEW SECONDS: look at this gist

What is my problem? How can I fix it? Please, help!

Jeavons answered 4/6, 2014 at 9:22 Comment(2)
I'm not a RoR dev but is that code being called from a single threaded appartement thread: https://mcmap.net/q/724082/-coinitialize-has-not-been-called-exceptions-in-cLeschen
You have to ensure that CoInitialize is called before any call to WIN32OLE.new. You can use an 'initializer' script in Rails.Merline
M
2

Don't know the difference between rails c and rails, so I'll give some random advise.

First, it is a bad idea to run this in a webserver, each time Word is run on the server, so what happens if multiple users start using this at the same time ?

You'd better convert your .doc files to another format first like .rtf or .docx (a batch conversion ?) and then use other gems that don't require Word itself.

If you keep it like this, consider to not close word (remove the word.quit) buit only close the document itself, the instance will be picked up the next time by the WIN32OLE.connect

While testing you'de better keep word visible so that you can better see what is happening (errors ?). I notice your path uses forward slashes while in this case backslashes are needed but since your code runs a few times before the error i suppose that is not the problem.

Hope this helps.

Marplot answered 4/6, 2014 at 9:48 Comment(0)
J
1

I upgrade my ruby from 1.9.3 to 2.0.0.

Now rails doesn't crashes and I have not problems with win23ole and reading old version MS Word documents.

I guess the problem was in memory usage - cause new ruby (>2.0.0) use new Garbage Collector.

Jeavons answered 4/6, 2014 at 11:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.