How do I choose between Tesseract and OpenCV? [closed]
Asked Answered
S

4

101

I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service.

I tried using Tesseract on some of my images and its accuracy seems decent. Later, I came across a very simple tutorial on using OpenCV to perform OCR using Python and was impressed. In a few minutes, I finished training the system and its accuracy was good. But of course, taking this approach means I need to train my system extensively using a large training set.

My specific questions are the following:

  • How does one choose between Tesseract and using OpenCV to build a custom OCR app?
  • There are training datasets available for Tesseract for different languages. Does OpenCV have something similar so that I don't have to start ground up to achieve OCR?
  • Which one is better for a wanna-be commercial application?

Any suggestions?

Stodgy answered 15/7, 2012 at 6:7 Comment(4)
The answers below are really great, but as one that has worked with OCR I can tell you that the recognition quality in Tesseract is below the expectations of a commercial app user. Tesseract is great, but OCR is difficult - things like online training, or improvements on-the-fly are hmmm... still research. Google, the big sponsor behind TS lately, has decided to build its own engine - OCROpus. And while it promised to opensource it, the core recognition engine is not yet available - they only published a framework - that's an api to tesseract.Transpose
@vasile: That is very informative. I wasn't aware of OCROpus. Thank you. Would you have any suggestions for alternatives if my end objective is to say write a business card OCR (or say, one that reads gas station receipts like the one I linked: upload.wikimedia.org/wikipedia/en/3/34/…)? I'm asking because I'm just curious what the numerous mobile apps use to achieve this. I don't mind doing the OCR on the server-side. I was tempted to use OpenCV after seeing this cool demo: youtube.com/watch?v=OkcOfS1lTxsStodgy
There are a number of commercial OCR engines, just google OCR accuracy tests and you'll find some charts. And talking about mobile apps, most of them use tesseract. But if you bother to download some of them you'll see that the results are a bit different than the promises. They usually make a video demo in a carefully controlled environment and post it on youtube, but in the wild, if you scan a page/recipe/card/whatever you'll get some funny results.Transpose
@vasile: Thank you. Something to keep me busy for tonight. Really appreciate your time.Stodgy
C
85
  • Tesseract is an OCR engine. It's used, worked on and funded by Google specifically to read text from images, perform basic document segmentation and operate on specific image inputs (a single word, line, paragraph, page, limited dictionaries, etc.).

  • OpenCV, on the other hand, is a computer vision library that includes features that let you perform some feature extraction and data classification. You can create a simple letter segmenter and classifier that performs basic OCR, but it is not a very good OCR engine (I've made one in Python before from scratch. It's really inaccurate for input that deviates from your training data).

If you want to get a basic understanding of how hard OCR is, try OpenCV. Tesseract is for real OCR.

Cayenne answered 15/7, 2012 at 6:12 Comment(15)
+1 Thank you. I'm wondering - how customizable is Tesseract? For instance, do I have to first use OpenCV (or something similar) to remove the skew in the image? In other words, if Tesseract's accuracy is not so good for some of my cases, what can I do to improvise?Stodgy
That depends on your input images. Tesseract works best when the letters are crisp, in a horizontal line, spaced out, not connected and perfectly black-and-white. I tinkered in the DIY book scanning/preservation community for about a year and worked on software in my free time to ease the process. The best software out there (commercial or not) for post-processing any images with text is Scan Tailor. It has some CLI options, but if you take some time to see how it works, it's quite amazing.Cayenne
I worked on Scan Tailor's source code for a little bit and it doesn't use OpenCV internally, but many of the algorithms that were created can be rewritten with OpenCV's functions really easily. If your images are not warped and are not degraded, you really just need to implement adaptive binarization and some simple despeckling before feeding your image into Tesseract.Cayenne
Regarding your question, I was just testing out some random input images yesterday. I tried a receipt from a gas station: upload.wikimedia.org/wikipedia/en/3/34/… It recognized the 0 as an 8 (in the total of $20.00). I admit that digit was hard even for my to decipher but I wasn't sure what else can be done to adapt Tesseract to these situations or perhaps introduce a learning component if I will be having an active user base.Stodgy
Tesseract is trained for reading specific font sets. Those blocky letters aren't one of them. You'll have to present Google an animal sacrifice and try training Tesseract yourself: code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3Cayenne
Also, before you embark on writing a custom OCR engine for reading those letters, don't expect it to be accurate. I wrote one for automating the Wheel of Fortune and the sample images (screenshots of an online game) were JPEGs. The artifacts from the JPEG compression were enough to screw up the image classifier unless I provided about 10-20 sample images of every single character.Cayenne
Understood. Thank you so much for your time! :) It would have been nice to have a way to incorporate incremental learning into Tesseract (assuming that the user base will increase over time, I don't see why this cannot be incorporated). As I see, the training process in its current form seems pretty complicated but I could be wrong - I'll give it another read.Stodgy
I've never trained Tesseract because it worked so well for my sample inputs, but there seem to be a few simple community-owned programs that simplify the progress. You might be able to write a Python script that can dynamically generate training data from a set of mapped inputs, but I really have no idea how the training process works.Cayenne
Not a problem. I'm planning on digging into it in a day. I'm sure interfacing Tesseract with Mechanical Turks would be useful :)Stodgy
Huh, that might actually work!Cayenne
@Cayenne I have used Scan Tailor and it seems to be pretty good. Can you point me to the algorithms that they use?Krasnoyarsk
They're implemented from a bunch of research papers. You can read the source code to see how they work. I know the names of a few of them, so if you could tell me which algorithm you're looking for I could try to point you to the appropriate paper.Cayenne
can some one give me some tutorial on how to use tesseract api for my android application? I tried searching for some tutorial but those were to hard to understand for me being the beginner of android developement.Ectosarc
Nice I see that Scan Tailor can be used in command line also, sweet!Mammon
@valentt: Development on Scan Tailor stopped almost two years ago, so it's effectively a dead project. It's too bad that there's no other alternative.Cayenne
I
67

I am the author of that digit recognition tutorial you mentioned, and I would say, that is no way substitute for tesseract.

Tesseract is a really good OCR engine, may be the best OpenSource OCR engine.

The tutorial you mentioned is just a try, to understand most simple working of OCR.

So, if you are looking for OCR app, I would recommend you to use OpenCV for preprocessing the image and then apply tesseract engine.

Idomeneus answered 15/7, 2012 at 6:21 Comment(4)
+1 Thank you. First of all, thank you for the tutorial :) It was a really interesting read. Are you aware of any references/tutorials on how to utilize OpenCV along with Tesseract? Not talking about interfacing but the type of image transformations or pre-processing that needs to be done to improve the accuracy of Tesseract?Stodgy
Would just like to say, that while Tesseract is a good OCR engine in comparison to others, it is still fairly inaccurate, I've had about a 40% success rate in getting the correct text recognized. Hopefully, it'll be better in a couple of years.Creamcolored
@Creamcolored You just need to train tesseract and you can get better results in few hours or days not years. opensource.newmediaist.com/tesseract-training.htmlMammon
I use PyTesseract for the real time text extraction. It works fine on the Linux PC but it is very slow on the Raspberry Pi environment ... Any way to install lightweight version? For instance, process only digits and capital letters of the English alphabet?Kira
T
10

The two can be complementary. If you read the paper on OpenCV: https://github.com/tesseract-ocr/docs/blob/master/tesseracticdar2007.pdf

It highlights that "Since HP had independently-developed page layout analysis technology that was used in products, (and therefore not released for open-source) Tesseract never needed its own page layout analysis. Tesseract therefore assumes that its input is a binary image with optional polygonal text regions defined."

This type of task can be performed by OpenCV and the resulting image handed off to Tesseract. You can find a sample of this type of code in the Git repo: https://github.com/Itseez/opencv_contrib/tree/master/modules/text/samples The samples use Tesseract APIs to do image to text conversion.

Tonetic answered 13/11, 2014 at 1:50 Comment(0)
D
4

OpenCV is a library for CV, used to analyze and process images in general. Tesseract is a library for OCR, which is a specialized subset of CV that's dedicated to extracting text from images.

From OpenCV.org

.....used to detect and recognize faces, identify objects, classify human actions in videos, track camera movements, track moving objects, extract 3D models of objects, produce 3D point clouds from stereo cameras, stitch images together to produce a high resolution image of an entire scene, find similar images from an image database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and establish markers to overlay it with augmented reality, etc

From Tesseract Github:

.....can be used directly, or (for programmers) using an API to extract typed, handwritten or printed text from images. It supports a wide variety of languages.

Disqualification answered 26/9, 2017 at 3:25 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.