Image Preprocessing for OCR - Tessaract
Asked Answered
L

1

7

This is the image I'm trying to detect

Obviously this image is pretty tough as it is low clarity and is not a real word. However, with this code, I'm detecting nothing close:

import pytesseract
from PIL import Image, ImageEnhance, ImageFilter
image_name = 'NedNoodleArms.jpg'
im = Image.open(image_name) 
im = im.filter(ImageFilter.MedianFilter())
enhancer = ImageEnhance.Contrast(im)
im = enhancer.enhance(2)
im = im.convert('1')
im.save(image_name)
text = pytesseract.image_to_string(Image.open(image_name))
print(text)

outputs

, Mdfiaodfiamms

Any ideas here? The image my contrasting function produces is:

enter image description here

Which looks decent? I don't have a ton of OCR experience. What preprocessing would you recommend here? I've tried resizing the image larger, which helps a little bit but not enough, along with a bunch of different filters from PIL. Nothing getting particularly close though

Lithopone answered 4/8, 2018 at 19:33 Comment(3)
Do not convert to 1 bit B/W, use grayscale ('L" IINM).Gourd
Thanks Paulo! That helped a lot. Its outputting 'NedNnodleArrns', which is super reasonableLithopone
Glad to help, check my answer.Gourd
G
10

You are right, tesseract works better with higher resolutions so sometimes resizing the image helps - but don't convert to 1 bit.

I got good results converting to grayscale, making it 3 times as large and making the letters a bit brighter:

>>> im = Image.open('j78TY.png')\
          .convert('L').resize([3 * _ for _ in im.size], Image.BICUBIC)\
          .point(lambda p: p > 75 and p + 100)
>>> pytesseract.image_to_string(im)
'NedNoodleArms'

Check this jupyter notebook:

enter image description here

Gourd answered 5/8, 2018 at 5:48 Comment(4)
Can you explain what this lambda function is doing? Is it like thresholding?Swoosh
@Swoosh yeah, It is a lame hack for making the letters brighter, if a pixel value is over 75 (of 256) then add 100 to its value.Gourd
You mean (of 255), right? What if pixel values is already over 200?Swoosh
Awesome answer for an awesomely effective simple trick! Thanks!Bertram

© 2022 - 2024 — McMap. All rights reserved.