Detect if an OCR text image is upside down
Asked Answered
P

4

46

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:

import numpy as np
import cv2

from skimage.transform import radon


filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
    I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I)  # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))

# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)

This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.

Here is an example:
enter image description here

The resulted image that I get is the same as the input image.

Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.


EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:

import cv2
import pytesseract


print(pytesseract.image_to_osd(cv2.imread(file_name)))

If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.

Plumy answered 12/4, 2019 at 14:41 Comment(15)
Not a solution, but another heuristic you could use (assuming you are reading latin script) is comparing the amount of black in the left and right or top and bottom halves. If a page has significantly more black on the right (line breaks) and/or at the bottom I guess it's likely to be upside down.Granitite
@jdehesa Good idea! However some documents do not have a totally white background. So it may lead to some confusion.Plumy
@jdehesa, However if it is possible to detect some characters (like R for example) and based on their orientation, we can decide the rotation angle. My problem is that I have no idea how to detect a specific character in an image. Have you any idea how this can be done?Plumy
Is there always a title in the paper? Can you say if there are patterns to follow? I'd leave OCR like the last option...it would be easier to detect white spots, creating a rect and measuring its size. Like the white spots between the title and the rest.Miguelmiguela
@Plumy Mmm not sure, if they are about constant size you could use some convolution filters and see if they work better (you get more "matches") upright or upside down... Otherwise I'm not sure (I don't know all that much about CV tbh), I mean surely you can create a neural net or something that classifies that but that's some more work.Granitite
@GDias, Well, yes, there are some patterns to follow for some documents (personal documents). Here is an example and this is another one of a part of the docs that I have: there is always a blue zone on the top of the document.Plumy
@jdehesa, thanks for the clarification. I'll let this as a last option.Plumy
Well, for those documents with the blue line, you can read the blue channel of the image and create a treshold to blue. If it detects the presence of blue, and is under the middle of the document, you can say the document is upside down.Miguelmiguela
You may preprocess a page to be completely gray-scale with high contrast then apply black-white test as jdehesa suggested. You always need normalization before OCR or any detections though.Fiji
Not a solution, but you're more likely to get a complete line of text at the top of a page than at the bottom.Benildis
Try different rotations of the input image and put them into tesseract, and select the "best" one. "Best" one might be the one which contains the most sensible words.Eachern
@JanChristophTerasa, actually, I am doing this till I find another solution :DPlumy
Do you want this only for text?Justino
@Xilpex Yeah, the main reason to check if a document is upside down is to correctly extract the text later in a following step.Plumy
surely there's an algorithm than can make use of the fact text 'sits' on an imaginary straight line. So if you scan down the page % of black across this line will suddenly jump sharper if upside down than otherwiseKongo
C
38

Python3/OpenCV4 script to align scanned documents.

Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:

rotate to find maximum zebra

Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.

# Rotate the image around in a circle
angle = 0
while angle <= 360:
    # Rotate the source image
    img = rotate(src, angle)    
    # Crop the center 1/3rd of the image (roi is filled with text)
    h,w = img.shape
    buffer = min(h, w) - int(min(h,w)/1.15)
    roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
    # Create background to draw transform on
    bg = np.zeros((buffer*2, buffer*2), np.uint8)
    # Compute the sums of the rows
    row_sums = sum_rows(roi)
    # High score --> Zebra stripes
    score = np.count_nonzero(row_sums)
    scores.append(score)
    # Image has best rotation
    if score <= min(scores):
        # Save the rotatied image
        print('found optimal rotation')
        best_rotation = img.copy()
    k = display_data(roi, row_sums, buffer)
    if k == 27: break
    # Increment angle and try again
    angle += .75
cv2.destroyAllWindows()

best rotation

How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:

rightside upupside down

# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Chum answered 17/4, 2019 at 22:9 Comment(3)
This is a great answer. But the upside down detection will probably fail on the last page of each chapter etc. I guess you could additionlly do a similar analysis on left and right margins, as paragraph ends are on average deeper inset than paragraph beginnings.Vyner
I would suggest summing up the non-blacks from the top as well as from the left, since text will start from the top left for english text.Bushtit
For upside down detection can you make use of the fact there is a halo effect due to capital letters and the frequency of letters like t, h, k. In your still image above the halo is below the white bands. I.e sum of areas cut in half between the white bands and brighter of two sums needs to be on top.Kongo
P
8

Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:

  1. Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
  2. Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
  3. For each peak, check if the stronger sub-peak is on top or at the bottom.
  4. Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.

The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.

Here's a figure to illustrate the approach; A few lines of you example image

  • Blue: Original projection
  • Orange: smoothed projection
  • Horizontal line: average of the smoothed projection for the whole image.

bla

here's some code that does this:

import cv2
import numpy as np

# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]

# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)

# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]

# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
    line = smooth[start:end]
    if np.argmax(line) < len(line)/2:
        lower_peaks += 1

print(lower_peaks / len(line_starts))

this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.

Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.

Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.

Punctual answered 17/4, 2019 at 21:42 Comment(3)
This answer needs reason why approach was taken and why it somewhat works. Two major peaks are due to ‘o-ness’ of letters like o, b, q, e and other letters. By smoothing you’ve lost reliability here. Ignore the two major peaks and concentrate on the two sub peaks above and below these due to capital letters and frequency of letters like t, h, l, d. In your Gaussian image the sub peaks make it obvious the image is upside downKongo
What you say is correct in an ideal world. However detecting the small peaks requires much more sensitive detection and is more prone to irregularities in the scan (e.g. The vertical black lines at the edge of the example scan). Therefore I smoothed the projection.Punctual
the major peaks contains a lot of noise, the sub peaks contain the signal. I respectfully disagree that smoothing i.e. averaging noise and signal, is better even in the real world.Kongo
J
1

You can use the Alyn module. To install it:

pip install alyn

Then to use it to deskew images(Taken from the homepage):

from alyn import Deskew
d = Deskew(
    input_file='path_to_file',
    display_image='preview the image on screen',
    output_file='path_for_deskewed image',
    r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()

Note that Alyn is only for deskewing text.

Justino answered 17/4, 2019 at 15:6 Comment(3)
did you try the code that you posted? when I run it, I get this error ImportError: cannot import name 'Deskew'Plumy
it works if you change deskew to lowercase but then there's another error. Seems it's not for python 3.7 (?)Stipel
@Stipel -- No, its not for python 3; But there are only minor changes.Justino
J
1

Here is two solutions, using pytesseract.

First, we can use pytesseract ability to detect image orientation :

import pytesseract

image = cv2.imread(path_to_image)
results = pytesseract.image_to_osd(image, output_type=Output.DICT)

printf(results["orientation"])
printf(results["rotate"])

It works most of the time, but sometimes it might fail because the rotated page look like text in a foreign language (eg: Cyrillic). I could not find a way to specify language when using image_to_osd() function. However, it's possible to do it during a full OCR. If we compare both texts (the one without rotation and the one after) we can try to guess out page orientation :

import pytesseract
import langdetect

def ocr(img): 
    return pytesseract.image_to_string(img, lang='eng').replace('\n', ' ')  

def score(text): # between 0.0 and 1.0
    if len(text)==0: return 0
    result = langdetect.detect_langs(text.lower())
    result = {lang.lang: lang.prob for lang in result}
    return result.get('en', 0)

img = cv2.imread(path_to_image)
score0 = score(ocr(img))
if score0<0.99: 
    score180 = score(ocr(cv2.rotate(img, cv2.ROTATE_180)))
    if score180 > score0: 
        print("180 rotate")
else:
    print("normal")

Additionally, applying high contrast on the image before OCR might help :

img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) #grayscale
img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)[1] #b&w
Jaunty answered 7/7, 2024 at 18:52 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.