Tesseract 3 (OCR) - .NET Wrapper
Asked Answered
F

4

21

http://code.google.com/p/tesseractdotnet/

I am having a problem getting Tesseract to work in my Visual Studio 2010 projects. I have tried console and winforms and both have the same outcome. I have come across a dll by someone else who claims to have it working in VS2010:

http://code.google.com/p/tesseractdotnet/issues/detail?id=1

I am adding a reference to the dll which can be found in the attached to post 64 from the website above. Every time I build my project I get an AccessViolationException saying that an attempt was made to read or write protected memory.

public void StartOCR()
{
    const string language = "eng";
    const string TessractData = @"C:\Users\Joe\Desktop\tessdata\";

    using (TesseractProcessor processor = new TesseractProcessor())
    {
        using (Bitmap bmp = Bitmap.FromFile(fileName) as Bitmap)
        {
            if (processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT))
            {
                string text = processor.Recognize(bmp);
            }
        }
    }
}

The access violation exception always points to if (processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT)). I've seen a few suggestions to make sure the solution platform is set to x86 in the configuration manager and that the tessdata folder location is finished with trailing slash, to no avail. Any ideas?

Flattery answered 8/4, 2012 at 22:15 Comment(2)
can you please share your full implementation? I think I am reading conflicting instructions on how to set this up.Cirilla
Cannot take the credit but this worked for me: Replace 'eng.traineddata' in the tessdata folder with this code.google.com/p/tesseract-ocr/downloads/…Gallinaceous
F
7

It appeared to be the contents of the tessdata folder that was causing the problem. Obtained the tessdata folder from the first link and all is now working.

Flattery answered 31/7, 2012 at 15:6 Comment(0)
H
2

I have just completed a project with tesseract engine 3. i think, there is a bug in the engine, that need to be rectified. What i Did to remove "AccessViolationError" is, add "\tessdata" to the real tessdata directory string. I don't know why, but the engine seems to be truncating the innermost directory in the Tessdata path.

Just made Full OCR package (Dlls+Tessdata(english)) that works with .net framework 4.

Homeless answered 15/7, 2012 at 17:57 Comment(1)
True! Folder is located in "e:\tessdata", and variable definition is const string tessractData = @"e:\tessdata\tessdata";Gilead
F
0

If somebody has the same problem and advice with trailing slash doesn't work, try... TWO ending slashes! Seriosly. It works for me.

if (processor.Init(@".\tessdata\\", "eng", (int)eOcrEngineMode.OEM_DEFAULT))
Felty answered 31/10, 2014 at 15:5 Comment(0)
T
0

Seems your problem relates to stability issue mentioned here. On the official site there is a recommendation to use previous stable release 2.4.1. You can install it from nuget.org via the package manager command: Install-Package Tesseract -Version 2.4.1

Tabling answered 12/2, 2016 at 20:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.