How to improve tesseract.js accuracy?

About

Asked 1/12, 2019 at 13:51 Answered 3/12, 2019 at 2:15

Solved javascript node.js typescript ocr tesseract.js

Im using this piece of code from the website but its not accurate enough

 const worker1 = createWorker();
  const worker2 = createWorker();

  await worker1.load();
  await worker2.load();
  await worker1.loadLanguage("eng");
  await worker2.loadLanguage("eng");
  await worker1.initialize("eng");
  await worker2.initialize("eng");

  scheduler.addWorker(worker1);
  scheduler.addWorker(worker2);

  /** Add 10 recognition jobs */
  const {
    data: { text }
  } = await scheduler.addJob("recognize", image);

this is the type of image i'm trying to read its text:

thou it seems simple and easy ,sometimes tesseract fails to read it . is there any better alternatives to tesseract.js or any way to improve the accuracy?

Octillion answered 1/12, 2019 at 13:51 Comment(8)

Have you tried applying some filtering on the input images, to enhance the contrast, for example or enlarge them? I think one way to get better accuracy, is to do some modifications on the input images. – Amarelle 1/12, 2019 at 13:53

actually i have applied some filters and removed some level of noise to make it more clear and performance is improved , but still its unable to read sometimes, i dont know why – Octillion 1/12, 2019 at 13:57

you suggest any special modifications ? – Octillion 1/12, 2019 at 13:58

You can start with this post: docparser.com/blog/improve-ocr-accuracy Increasing contrast, image sharpening, removing noise are some basic image enhancements that might help you get better accuracy results. – Amarelle 1/12, 2019 at 14:12

Additionally, you might want to check threshold filtering. See this code for example: github.com/laurenzcodes/Canvas-Threshold-Effect – Amarelle 1/12, 2019 at 14:14

You can also dive deeper into edge detection algorithms, like the Sobel Algorithm or Canny Algorithm. – Amarelle 1/12, 2019 at 14:20

I use a negative version of your image and it works fine. Also additional gamma correction looks promising. – Jiggle 1/12, 2019 at 18:11

I am facing accuracy issues as well piping in an HTML canvas with very basic black strokes on a white background. I am getting wildly inconsistent results with even just attempting to detect numbers :/ – Roe 6/12, 2021 at 4:53

When applying OCR using Tesseract, it is important to preprocess the image so that the desired text to detect is in black with the background in white. To do this, you can apply a simple threshold to obtain a binary image. Here's the image after preprocessing:

Result from Tesseract

I implemented this approach in Python OpenCV, but you can adapt a similar strategy into Javascript!

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

# Load image and Otsu's Threshold to get a binary image
image = cv2.imread('1.png', 0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Perform OCR
data = pytesseract.image_to_string(thresh, lang='eng', config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.waitKey()

Linton answered 3/12, 2019 at 2:15 Comment(5)

thanks for the answer , do you know any special node js library to achieve that ? – Octillion 3/12, 2019 at 9:9

using jimp i inverted the color and the accuracy is really improved and i think its enough for my current project , but i still need some good library to do that in node js , anyway thanks for your answer. – Octillion 3/12, 2019 at 10:35

Unfortunately, I'm not too familiar with node.js but once you find one you can follow the same approach. Good luck! – Linton 3/12, 2019 at 20:38

Thanks for the hint regarding Jimp; I'm not sure why it shouldn't be possible to port it but I found something that looks similar and runs on Node.js: Nimp – Vidovic 14/3, 2021 at 17:4

I can recommend using the sharp npm library, it has all these features built in – Zygosis 24/9, 2022 at 11:54

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags