Is it possible to check orientation of an image before passing it through pytesseract ocr module
Asked Answered
C

4

8

For my current ocr project I tried using tesserect using the the python cover pytesseract for converting images into text files. Up till now I was only passing well straight oriented images into my module at it was able to properly figure out text in that image. But now as I am passing rotated images it is not able recognize even a single word. So to get good result I need to pass images only with proper orientation. Now I want to know that is there any method to figure out the orientation of an image before passing it in ocr module. Please let me know what methods can I used to do that orientation check.

This is the method which I am using to do conversion:

def images_to_text(testImg):
    print('Reading images form the directory..........')
    dataFile=[]
    for filename in os.listdir(testImg):
        os.chdir(testImg)
        # Define config parameters.
        # '-l eng'  for using the English language 
        # '--oem 1' for using LSTM OCR Engine
        config = ('-l eng --oem 1 --psm 3')
        # Read image from disk
        im = cv2.imread(str(filename), cv2.IMREAD_COLOR)
        # Run tesseract OCR on image
        text = pytesseract.image_to_string(im, config=config)
        #basic preprocessing of the text
        text = text.replace('\t',' ')
        text= text.rstrip()
        text= text.lstrip()
        text = text.replace(' +',' ')
        text = text.replace('\n+','\n')
        text = text.replace('\n+ +',' ')

        #writing data to file
        os.chdir(imgTxt)
        rep=filename[-3:]
        name=filename.replace(rep,'txt')
        with open(name, 'w') as writeFile:
            writeFile.write("%s\n" % text)
        text = text.replace('\n',' ')
        dataFile.append(text)
    print('writing data to file done')    
    return dataFile
Cumber answered 12/3, 2019 at 10:41 Comment(1)
@Noremac Could you please look into this issueCumber
C
23

I got the solution to check the orientation of an image. We already have an method in pytesseract to do this work.

imPath='path_to_image'
im = cv2.imread(str(imPath), cv2.IMREAD_COLOR)
newdata=pytesseract.image_to_osd(im)
re.search('(?<=Rotate: )\d+', newdata).group(0)

Output of method pytesseract.image_to_osd(im) is:

Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 4.21
Script: Latin
Script confidence: 1.90

And we need rotation value only for changing the orientation, so using regular expression will do further remaining work.

re.search('(?<=Rotate: )\d+', newdata).group(0)

This would be the final method to rotate an image to bring it to 0` orientation.

def rotate(image, center = None, scale = 1.0):
    angle=360-int(re.search('(?<=Rotate: )\d+', pytesseract.image_to_osd(image)).group(0))
    (h, w) = image.shape[:2]

    if center is None:
        center = (w / 2, h / 2)

    # Perform the rotation
    M = cv2.getRotationMatrix2D(center, angle, scale)
    rotated = cv2.warpAffine(image, M, (w, h))

    return rotated
Cumber answered 12/3, 2019 at 13:41 Comment(10)
Ok. Good to know. But will this orientation applied to whole image or just the text?Compellation
@KarshSoni This would give rotation of the whole image.Cumber
it does rotate the image, but it keeps the same resolution so you gotta swap width with heightDodd
To rotate: from scipy import ndimage and then ndimage.rotate(img, float(angle) * -1)Marzi
We can also use PIL.Image.rotate to rotate the original image once we have found the angle of rotation with the above approach. In my case, this is working like a charmBreathless
Is it possible to grab the image orientation using pytesseract.image_to_data() instead of pytesseract.image_to_osd()? I need to run pytesseract.image_to_data() anyway to extract text from the image. I'd love to not have to call pytesseract twice.Cat
This does not work for me (see error below). The image does not have 0 dpi and Tesseract can read it with image_to_data(). Apparently, the page does not contain enough characters? pytesseract.pytesseract.TesseractError: (1, 'Tesseract Open Source OCR Engine v5.0.0-alpha.20210506 with Leptonica UZN file C:\\Users\\cbh037\\AppData\\Local\\Temp\\tess_x1lr6ymt loaded. Estimating resolution as 242 UZN file C:\\Users\\cbh037\\AppData\\Local\\Temp\\tess_x1lr6ymt loaded. Warning. Invalid resolution 0 dpi. Using 70 instead. Too few characters. Skipping this page Error during processing.')Ganoid
@CasperHansen When an image has very less data/text that model is not able to predict anything it throws this error. Try with a better image to see if it helps.Cumber
@MousamSingh that is wrong - it is a PyTesseract (which cascades to a PIL Image) bug. The equivalent Tesseract command works. Use tesseocr instead - see my answer.Ganoid
Okay Thanks for correction, if you have tried it.Cumber
G
4

EDIT: A better approach might be to install the tesseocr package instead since it works with the most updated Tesseract version.

Conda: conda install -c conda-forge tesserocr

from tesserocr import PyTessBaseAPI, OEM, PSM

def get_angles2(img):
    with PyTessBaseAPI( psm=PSM.OSD_ONLY, lang="osd", oem=OEM.TESSERACT_LSTM_COMBINED ) as api:
        api.SetImage(img)
        os = api.DetectOrientationScript()

    if os['orient_deg'] == 0:
        return 0
    elif os['orient_deg'] > 90:
        return 360-os['orient_deg']
    else:
        return -os['orient_deg']

ORIGINAL

My answer is based on computing the angle between the lines generated by a Hough Transform because nothing else worked for my dataset. This is a fast approach that turned out to work well in practice.

This prerequisite to this function is grayscaling, binarizing, and color inversion.

import cv2

img = cv2.imread('test0.png')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1]
img = cv2.bitwise_not(img)

After this, you can run the function below and get all the angles for all the though lines detected. Please do tune the threshold parameter (currently at 300) as specified in the OpenCV documentation: Accumulator threshold parameter. Only those lines are returned that get enough votes ( >threshold ). For more information on calculating the angle on the (x,y) coordinates, refer to this Stack Overflow.

import cv2
import numpy as np

def get_angles(img):
    edges = cv2.Canny(img, 50, 150, apertureSize = 3)
    lines = cv2.HoughLines(edges, 1, np.pi/180, threshold=300)

    angles = []

    for line in lines:
        rho, theta = line[0]
        a = np.cos(theta)
        b = np.sin(theta)
        x0 = a*rho
        y0 = b*rho
        x1 = int(x0 + 1000*(-b))
        y1 = int(y0 + 1000*(a))
        x2 = int(x0 - 1000*(-b))
        y2 = int(y0 - 1000*(a))
        
        radians = np.arctan2(y2-y1, x2-x1)
        degrees = np.degrees(radians)

        angles.append(degrees)

    return angles

After running this function, you will get a long list of angles from the Hough Transformation. From an image that SHOULD NOT be rotated:

[-90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -0.974421553508672, -0.974421553508672, -0.974421553508672, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.9749091578796124, 0.9749091578796124, 0.9749091578796124, 0.9749091578796124, 1.0030752389838637, 1.0030752389838637, 3.9855957480807316, 3.9875880958503185]

An image that SHOULD be rotated:

[-90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -90.0, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99692476101613, -88.99642282400909, -88.99642282400909, -88.02210297626898, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99346106671473, -87.99245711203707, -87.99245711203707, -87.99245711203707, -87.99245711203707, -86.99022425882445, -86.99022425882445, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.98871912968818, -86.01440425191927, -86.01440425191927, -86.01440425191927, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -86.01241190414969, -85.00791883390836, -85.00791883390836, -85.00791883390836, -85.00791883390836, -85.00542418989113, -85.00542418989113, -0.974421553508672, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.02866221847606629, 0.9749091578796124, 85.9838177634312, 86.98871912968818, 86.98871912968818, 86.98871912968818, 86.99022425882445, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 87.99346106671473, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613, 88.99692476101613]

This is where I will leave it up to you with a few options on which angle to choose for rotation. Option 3 should work great for the arrays I presented above, but please do tune it to your case:

  1. Use the median angle to rotate the image
  2. Average of the first, middle, and last angle
  3. Find the average of the first 10 and last 10 values. If the difference is too large, the image does not need rotation. However, if they are close, one could find the average of the 20 values (from the first 10 and last 10) and use that as the value for rotation.

Here is the list of guides that I tested and which did not work (for me). I believe most of these packages do not work well if financial data (like equations or tables) are included. However, if you only have text in an image, these guides could work for you:

  1. This first one gave me -90 degrees for both an image that needed to be rotated and an image that did not need to be rotated. https://becominghuman.ai/how-to-automatically-deskew-straighten-a-text-image-using-opencv-a0c30aed83df
  2. This gave a lot of errors in Python 3. After fixing the code, it turned out not to work at all. https://mzucker.github.io/2016/08/15/page-dewarping.html
  3. You can add Mousam Singh's example above. This did not work because Tesseract throws an error. Furthermore, I am not sure it is too wise to run Tesseract twice.
  4. This package did not work for me. It was too simple of an approach. https://github.com/sbrunner/deskew
  5. A noteworthy mention that I did not get to try out was Leptonica, which is used by Tesseract, OpenCV and other major packages. It requires managing a dependency that I did not want to deal with, however, it could work for you if you already have some experience with C. https://tpgit.github.io/Leptonica/skew_8c.html
Ganoid answered 10/9, 2021 at 11:55 Comment(0)
C
1

@MousamSingh, You can't check orientation of an image directly as that would be impossible as whenever you try to pass an image through tesseract it would detect text and give you back string which may have noise or unnecessary text in result.

Answer -> Before passing an image directly to tesseract instead you should first try to detect texts in that image then bound that text with the border that would end up creating rectangle around the text and then crop those texts and pass it to tesseract and it would give you much better result and as you are concerned with the orientation of an image. what you should do is get those coordinates of boxes and using those coordinates, You will be able to find angle and you can rotate that image to particular angle if needed.

I think it might help you. Give it a vote if you find your answer. Thanks

And yes i forgot to give suggesting you way to detect texts...

This the repository for python which will be useful for you to detect texts.

github link to python code for text detection

Let me know if you need anything else. Thanks

Compellation answered 12/3, 2019 at 12:15 Comment(1)
Thanks, I will look into it and let you know if i need any further help in doing so.Cumber
R
0

I solved a similar use case using Large Language Models (LLMs). Firstly, I fine-tuned an LLM on a custom dataset for an image classification task with four classes: horizontal-down, horizontal-left, vertical-left, and vertical-right. Then, I utilized a computer vision library to rotate the image accordingly.

This is specifically tailored for Aadhaar cards, but you can create your own custom dataset based on the specific use case at hand.

Take a look at the model I fine-tuned, which is available on Hugging Face: https://huggingface.co/MANMEET75/swin-tiny-patch4-window7-224-AadhaarCard-Orientation-Classification

Recluse answered 10/5, 2024 at 18:4 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.