OCR with the Tesseract interface
Asked Answered
V

5

34

How do you OCR an tiff file using Tesseract's interface in c#?
Currently I only know how to do it using the executable.

Vyborg answered 27/8, 2008 at 14:46 Comment(1)
can you please guide me how you managed to using Tesseract in C#?Liability
T
11

The source code seemed to be geared for an executable, you might need to rewire stuffs a bit so it would build as a DLL instead. I don't have much experience with Visual C++ but I think it shouldn't be too hard with some research. My guess is that someone might have had made a library version already, you should try Google.

Once you have tesseract-ocr code in a DLL file, you can then import the file into your C# project via Visual Studio and have it create wrapper classes and do all the marshaling stuffs for you. If you can't import then DllImport will let you call the functions in the DLL from C# code.

Then you can take a look at the original executable to find clues on what functions to call to properly OCR a tiff image.

Torsk answered 27/8, 2008 at 17:26 Comment(0)
G
40

Take a look at tessnet (nuget packages https://www.nuget.org/packages/TesserNet/ https://www.nuget.org/packages/NuGet.Tessnet2 )

Gustavogustavus answered 24/9, 2008 at 14:14 Comment(2)
This is better than P/Invoking it yourself.Oxfordshire
link is not valid anymoreRegen
T
11

The source code seemed to be geared for an executable, you might need to rewire stuffs a bit so it would build as a DLL instead. I don't have much experience with Visual C++ but I think it shouldn't be too hard with some research. My guess is that someone might have had made a library version already, you should try Google.

Once you have tesseract-ocr code in a DLL file, you can then import the file into your C# project via Visual Studio and have it create wrapper classes and do all the marshaling stuffs for you. If you can't import then DllImport will let you call the functions in the DLL from C# code.

Then you can take a look at the original executable to find clues on what functions to call to properly OCR a tiff image.

Torsk answered 27/8, 2008 at 17:26 Comment(0)
K
7

C# program launches tesseract.exe and then reads the output file of tesseract.exe.

Process process = Process.Start("tesseract.exe", "out");
process.WaitForExit();
if (process.ExitCode == 0)
{
    string content = File.ReadAllText("out.txt");
}
Klatt answered 10/6, 2013 at 5:36 Comment(0)
H
6

I discovered today that EMGU now includes a Tesseract wrapper. While the number of unmanaged dlls of the opencv lib might seem a little daunting, it's nothing that a quick copy to your output directory won't cure. From there the actual OCR process is as simple as three lines:

Tesseract ocr = new Tesseract(Path.Combine(Environment.CurrentDirectory, "tessdata"), "eng", Tesseract.OcrEngineMode.OEM_TESSERACT_ONLY);
this.ocr.Recognize(clip);
optOCR.Text = this.ocr.GetText();

"robomatics" put together a very nice youtube video that demonstrates a simple but effective solution.

Hallett answered 6/8, 2013 at 0:58 Comment(0)
E
0

Disclaimer: I work for Atalasoft

Our OCR module supports Tesseract and if that proves to not be good enough, you can upgrade to a better engine and just change one line of code (we provide a common interface to multiple OCR engines).

Ecclesiastes answered 29/5, 2009 at 12:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.