How to get Indexing Service and MODI to produce Full-text over OCR?
Asked Answered
L

1

75

I have configured Indexing Service to index my files, which also include scanned images saved as hi-res TIFF files. I also installed MS Office 2003+ and configured MS Office Document Imaging (MODI) correctly, so I can perform OCR on my images and even embed the OCR'd text into TIFFs.

Indexing Service is able to index and find those TIFF-s that were manually OCR'd and re-saved with text data (using MS Document Imaging tool).

Turns out, Data Execution Prevention (DEP) which is deployed with Windows XP SP2 thinks MODI is malicious and refuses to let it do its magic. I have been able to get it to work by turning DEP off completely, but I found this solution to be inelegant.

Is there a better solution to make this work, without disabling DEP?

Loving answered 5/8, 2008 at 23:16 Comment(4)
i tried the same thing and hit some of the same limitations. Also I found MODI just too slow for indexing large amounts of images.Rove
There's a hotfix that appears to address this problem.Atlantic
I dont know your environment, but instead of relying on some mixed magic which may break at many joints, why not go for something like a small app using tesseract ocr + lucene ?Jannette
@TuncayGöncüoğlu: Yeah, I've long since moved on from Modi and Indexing Service. I'm keeping this very old question just for historical purposes.Loving
W
3

Disable DEP for specific applications.

How to Disable DEP for Specific Applications

  1. Click the Start button on your Windows computer and choose Computer > System Properties > Advanced System Settings.
  2. From the System Properties dialog, select Settings.
  3. Select the Data Execution Prevention tab.
  4. Select Turn on DEP for all programs and services except those I select.

Click Add and use the browse feature to browse to the program executable you want to exclude—for example, excel.exe or word.exe.

Depending on your version of Windows, you may need to access the System Properties dialog box by right-clicking This PC or Computer from Windows Explorer.

  1. In Windows Explorer, right-click and choose Properties > Advanced System Settings > System Properties.
  2. Select Advanced > Performance > Data Execution Prevention.
  3. Select Turn on DEP for all programs and services except those I select.
  4. Click Add and use the browse feature to browse to the program executable you want to exclude.

Exclude:

C:\Program Files\Common Files\Microsoft Shared\MODI\11.0\MSPOCRDC.EXE  
C:\Program Files\Common Files\Microsoft Shared\MODI\11.0\MSPSCAN.EXE  
C:\Program Files\Common Files\Microsoft Shared\MODI\11.0\MSPVIEW.EXE

Additional information not part of the answer:

To obtain and install MODI on newest versions of Windows see:
"Microsoft Office Document Imaging – Office 2010 to Office 2016"

References:

"Exclude Programs From DEP (Data Execution Prevention)"

"Microsoft Office Document Scanning error"

MODI is part of (free) "Microsoft SharePoint Designer 2007".

Wesleyanism answered 7/9, 2018 at 9:38 Comment(2)
Thanks for taking the time to compile these instructions. I remember being unable to determine what exactly to exclude from DEP, back then. I'm reluctant to accept the answer since I no longer have a way to verify the solution, but I have upvoted it.Loving
Thanks. Perhaps if it gets a dozen UpVotes it's correct. The links say it worked for multiple people. My purpose was to clean the unanswered question queue.Wesleyanism

© 2022 - 2024 — McMap. All rights reserved.