Get correct image orientation by Google Cloud Vision api (TEXT_DETECTION)
Asked Answered
I

9

22

I tried Google Cloud Vision api (TEXT_DETECTION) on 90 degrees rotated image. It still can return recognized text correctly. (see image below)

That means the engine can recognize text even the image is 90, 180, 270 degrees rotated.

However the response result doesn't include information of correct image orientation. (document: EntityAnnotation)

Is there anyway to not only get recognized text but also get the orientation?
Could Google support it similar to (FaceAnnotation: getRollAngle)

enter image description here

Illyria answered 22/12, 2016 at 14:36 Comment(2)
If you'd like this feature to exist, consider posting a feature request to the google-cloud-platform issue tracker at code.google.com/p/google-cloud-platform/issues/list.Gettogether
Thanks. I didn't know there is a such issue list. I just posted the request. code.google.com/p/google-cloud-platform/issues/detail?id=194Illyria
T
5

As described in the Public Issue Tracker, our engineering team is now aware of this feature request, and there is currently no ETA for its implementation.

Note, orientation information may already be available in your image's metadata. An example of how to extract the metadata can be seen in this Third-party library.

A broad workaround would be to check the returned "boundingPoly" "vertices" for the returned "textAnnotations". By calculating the width and height of each detected word's rectangle, you can figure out if an image is not right-side-up if the rectangle 'height' > 'width' (aka the image is sideways).

Talya answered 9/1, 2017 at 21:5 Comment(2)
curious to know how the google cloud vision can get the correct text out of an image even when the image is not horizontally aligned and needs to be rotated. How does the API know how much to rotate the image? If this information isn't in the metadata of the image, how does the cloud API find this out?Wallinga
The issue that was resolved on your side, as documented at cloud.google.com/vision/docs/reference/rest/v1p4beta1/… allows one to distinguish only between 0, 90, 180, 270 degrees rotated (and only after some math). You already have all the info one needs to deskew the image, why not return it?Maccabean
O
8

You can leverage the fact that we know the sequence of characters in a word to infer the orientation of a word as follows (obviously slightly different logic for non-LTR languages):

for page in annotation:
    for block in page.blocks:
        for paragraph in block.paragraphs:
            for word in paragraph.words:
                if len(word.symbols) < MIN_WORD_LENGTH_FOR_ROTATION_INFERENCE:
                    continue
                first_char = word.symbols[0]
                last_char = word.symbols[-1]
                first_char_center = (np.mean([v.x for v in first_char.bounding_box.vertices]),np.mean([v.y for v in first_char.bounding_box.vertices]))
                last_char_center = (np.mean([v.x for v in last_char.bounding_box.vertices]),np.mean([v.y for v in last_char.bounding_box.vertices]))

                #upright or upside down
                if np.abs(first_char_center[1] - last_char_center[1]) < np.abs(top_right.y - bottom_right.y): 
                    if first_char_center[0] <= last_char_center[0]: #upright
                        print 0
                    else: #updside down
                        print 180
                else: #sideways
                    if first_char_center[1] <= last_char_center[1]:
                        print 90
                    else:
                        print 270

Then you can use the orientation of individual words to infer the orientation of the document overall.

Oshea answered 13/6, 2018 at 0:17 Comment(2)
what value are you using for MIN_WORD_LENGTH_FOR_ROTATION_INFERENCE ?Carnify
what is top_right and bottom_right defined as? is it the first/last character or the whole word?Widener
N
7

Jack Fan answer worked for me. This is my VanillaJS version.

/**
 *
 * @param gOCR  The Google Vision response
 * @return orientation (0, 90, 180 or 270)
 */
function getOrientation(gOCR) {
    var vertexList = gOCR.responses[0].textAnnotations[1].boundingPoly.vertices;

    const ORIENTATION_NORMAL = 0;
    const ORIENTATION_270_DEGREE = 270;
    const ORIENTATION_90_DEGREE = 90;
    const ORIENTATION_180_DEGREE = 180;

    var centerX = 0, centerY = 0;
    for (var i = 0; i < 4; i++) {
        centerX += vertexList[i].x;
        centerY += vertexList[i].y;
    }
    centerX /= 4;
    centerY /= 4;

    var x0 = vertexList[0].x;
    var y0 = vertexList[0].y;

    if (x0 < centerX) {
        if (y0 < centerY) {

            return ORIENTATION_NORMAL;
        } else {
            return ORIENTATION_270_DEGREE;
        }
    } else {
        if (y0 < centerY) {
            return ORIENTATION_90_DEGREE;
        } else {
            return ORIENTATION_180_DEGREE;
        }
    }
}
Nephrolith answered 5/7, 2019 at 17:44 Comment(0)
T
5

As described in the Public Issue Tracker, our engineering team is now aware of this feature request, and there is currently no ETA for its implementation.

Note, orientation information may already be available in your image's metadata. An example of how to extract the metadata can be seen in this Third-party library.

A broad workaround would be to check the returned "boundingPoly" "vertices" for the returned "textAnnotations". By calculating the width and height of each detected word's rectangle, you can figure out if an image is not right-side-up if the rectangle 'height' > 'width' (aka the image is sideways).

Talya answered 9/1, 2017 at 21:5 Comment(2)
curious to know how the google cloud vision can get the correct text out of an image even when the image is not horizontally aligned and needs to be rotated. How does the API know how much to rotate the image? If this information isn't in the metadata of the image, how does the cloud API find this out?Wallinga
The issue that was resolved on your side, as documented at cloud.google.com/vision/docs/reference/rest/v1p4beta1/… allows one to distinguish only between 0, 90, 180, 270 degrees rotated (and only after some math). You already have all the info one needs to deskew the image, why not return it?Maccabean
I
5

I post my workaround which really works for images 90, 180, 270 degrees rotated. Please see the code below.

GetExifOrientation(annotateImageResponse.getTextAnnotations().get(1));
/**
 *
 * @param ea  The input EntityAnnotation must be NOT from the first EntityAnnotation of
 *            annotateImageResponse.getTextAnnotations(), because it is not affected by
 *            image orientation.
 * @return Exif orientation (1 or 3 or 6 or 8)
 */
public static int GetExifOrientation(EntityAnnotation ea) {
    List<Vertex> vertexList = ea.getBoundingPoly().getVertices();
    // Calculate the center
    float centerX = 0, centerY = 0;
    for (int i = 0; i < 4; i++) {
        centerX += vertexList.get(i).getX();
        centerY += vertexList.get(i).getY();
    }
    centerX /= 4;
    centerY /= 4;

    int x0 = vertexList.get(0).getX();
    int y0 = vertexList.get(0).getY();

    if (x0 < centerX) {
        if (y0 < centerY) {
            //       0 -------- 1
            //       |          |
            //       3 -------- 2
            return EXIF_ORIENTATION_NORMAL; // 1
        } else {
            //       1 -------- 2
            //       |          |
            //       0 -------- 3
            return EXIF_ORIENTATION_270_DEGREE; // 6
        }
    } else {
        if (y0 < centerY) {
            //       3 -------- 0
            //       |          |
            //       2 -------- 1
            return EXIF_ORIENTATION_90_DEGREE; // 8
        } else {
            //       2 -------- 3
            //       |          |
            //       1 -------- 0
            return EXIF_ORIENTATION_180_DEGREE; // 3
        }
    }
}

More info
I found I have to add language hint to make annotateImageResponse.getTextAnnotations().get(1) always follow the rule.

Sample code to add language hint

ImageContext imageContext = new ImageContext();
String [] languages = { "zh-TW" };
imageContext.setLanguageHints(Arrays.asList(languages));
annotateImageRequest.setImageContext(imageContext);
Illyria answered 13/1, 2017 at 6:32 Comment(0)
W
3

Sometimes it is not possible to get orientation from metadata. For example if user made a photo using camera of mobile device with wrong orientation. My solution is based on Jack Fan answer and for google-api-services-vision (avalible via Maven).

my TextUnit class

  public class TextUnit {
        private String text;

        //    X of lowest left point
        private float llx;

        //    Y of lowest left point
        private float lly;

        //    X of upper right point
        private float urx;

        //    Y of upper right point
        private float ury;
    }

base method:

 List<TextUnit> extractData(BatchAnnotateImagesResponse response) throws AnnotateImageResponseException {
            List<TextUnit> data = new ArrayList<>();

            for (AnnotateImageResponse res : response.getResponses()) {
                if (null != res.getError()) {
                    String errorMessage = res.getError().getMessage();
                    logger.log(Level.WARNING, "AnnotateImageResponse ERROR: " + errorMessage);
                    throw new AnnotateImageResponseException("AnnotateImageResponse ERROR: " + errorMessage);
                } else {
                    List<EntityAnnotation> texts = response.getResponses().get(0).getTextAnnotations();
                    if (texts.size() > 0) {

                        //get orientation
                        EntityAnnotation first_word = texts.get(1);
                        int orientation;
                        try {
                            orientation = getExifOrientation(first_word);
                        } catch (NullPointerException e) {
                            try {
                                orientation = getExifOrientation(texts.get(2));
                            } catch (NullPointerException e1) {
                                orientation = EXIF_ORIENTATION_NORMAL;
                            }
                        }
                        logger.log(Level.INFO, "orientation: " + orientation);

                        // Calculate the center
                        float centerX = 0, centerY = 0;
                        for (Vertex vertex : first_word.getBoundingPoly().getVertices()) {
                            if (vertex.getX() != null) {
                                centerX += vertex.getX();
                            }
                            if (vertex.getY() != null) {
                                centerY += vertex.getY();
                            }
                        }
                        centerX /= 4;
                        centerY /= 4;


                        for (int i = 1; i < texts.size(); i++) {//exclude first text - it contains all text of the page

                            String blockText = texts.get(i).getDescription();
                            BoundingPoly poly = texts.get(i).getBoundingPoly();

                            try {
                                float llx = 0;
                                float lly = 0;
                                float urx = 0;
                                float ury = 0;
                                if (orientation == EXIF_ORIENTATION_NORMAL) {
                                    poly = invertSymmetricallyBy0X(centerY, poly);
                                    llx = getLlx(poly);
                                    lly = getLly(poly);
                                    urx = getUrx(poly);
                                    ury = getUry(poly);
                                } else if (orientation == EXIF_ORIENTATION_90_DEGREE) {
                                    //invert by x
                                    poly = rotate(centerX, centerY, poly, Math.toRadians(-90));
                                    poly = invertSymmetricallyBy0Y(centerX, poly);
                                    llx = getLlx(poly);
                                    lly = getLly(poly);
                                    urx = getUrx(poly);
                                    ury = getUry(poly);
                                } else if (orientation == EXIF_ORIENTATION_180_DEGREE) {
                                    poly = rotate(centerX, centerY, poly, Math.toRadians(-180));
                                    poly = invertSymmetricallyBy0Y(centerX, poly);
                                    llx = getLlx(poly);
                                    lly = getLly(poly);
                                    urx = getUrx(poly);
                                    ury = getUry(poly);
                                }else if (orientation == EXIF_ORIENTATION_270_DEGREE){
                                    //invert by x
                                    poly = rotate(centerX, centerY, poly, Math.toRadians(-270));
                                    poly = invertSymmetricallyBy0Y(centerX, poly);
                                    llx = getLlx(poly);
                                    lly = getLly(poly);
                                    urx = getUrx(poly);
                                    ury = getUry(poly);
                                }


                                data.add(new TextUnit(blockText, llx, lly, urx, ury));
                            } catch (NullPointerException e) {
                                //ignore - some polys has not X or Y coordinate if text located closed to bounds.
                            }
                        }
                    }
                }
            }
            return data;
        }

helper methods:

private float getLlx(BoundingPoly poly) {
        try {
            List<Vertex> vertices = poly.getVertices();

            ArrayList<Float> xs = new ArrayList<>();
            for (Vertex v : vertices) {
                float x = 0;
                if (v.getX() != null) {
                    x = v.getX();
                }
                xs.add(x);
            }

            Collections.sort(xs);
            float llx = (xs.get(0) + xs.get(1)) / 2;
            return llx;
        } catch (Exception e) {
            return 0;
        }
    }

    private float getLly(BoundingPoly poly) {
        try {
            List<Vertex> vertices = poly.getVertices();

            ArrayList<Float> ys = new ArrayList<>();
            for (Vertex v : vertices) {
                float y = 0;
                if (v.getY() != null) {
                    y = v.getY();
                }
                ys.add(y);
            }

            Collections.sort(ys);
            float lly = (ys.get(0) + ys.get(1)) / 2;
            return lly;
        } catch (Exception e) {
            return 0;
        }
    }

    private float getUrx(BoundingPoly poly) {
        try {
            List<Vertex> vertices = poly.getVertices();

            ArrayList<Float> xs = new ArrayList<>();
            for (Vertex v : vertices) {
                float x = 0;
                if (v.getX() != null) {
                    x = v.getX();
                }
                xs.add(x);
            }

            Collections.sort(xs);
            float urx = (xs.get(xs.size()-1) + xs.get(xs.size()-2)) / 2;
            return urx;
        } catch (Exception e) {
            return 0;
        }
    }

    private float getUry(BoundingPoly poly) {
        try {
            List<Vertex> vertices = poly.getVertices();

            ArrayList<Float> ys = new ArrayList<>();
            for (Vertex v : vertices) {
                float y = 0;
                if (v.getY() != null) {
                    y = v.getY();
                }
                ys.add(y);
            }

            Collections.sort(ys);
            float ury = (ys.get(ys.size()-1) +ys.get(ys.size()-2)) / 2;
            return ury;
        } catch (Exception e) {
            return 0;
        }
    }

    /**
     * rotate rectangular clockwise
     *
     * @param poly
     * @param theta the angle of rotation in radians
     * @return
     */
    public BoundingPoly rotate(float centerX, float centerY, BoundingPoly poly, double theta) {

        List<Vertex> vertexList = poly.getVertices();

        //rotate all vertices in poly
        for (Vertex vertex : vertexList) {
            float tempX = vertex.getX() - centerX;
            float tempY = vertex.getY() - centerY;

            // now apply rotation
            float rotatedX = (float) (centerX - tempX * cos(theta) + tempY * sin(theta));
            float rotatedY = (float) (centerX - tempX * sin(theta) - tempY * cos(theta));

            vertex.setX((int) rotatedX);
            vertex.setY((int) rotatedY);
        }
        return poly;
    }

    /**
     * since Google Vision Api returns boundingPoly-s when Coordinates starts from top left corner,
     * but Itext uses coordinate system with bottom left start position -
     * we need invert the result for continue to work with itext.
     *
     * @return text units inverted symmetrically by 0X coordinates.
     */
    private BoundingPoly invertSymmetricallyBy0X(float centerY, BoundingPoly poly) {

        List<Vertex> vertices = poly.getVertices();
        for (Vertex v : vertices) {
            if (v.getY() != null) {
                v.setY((int) (centerY + (centerY - v.getY())));
            }
        }
        return poly;
    }

    /**
     *
     * @param centerX
     * @param poly
     * @return  text units inverted symmetrically by 0Y coordinates.
     */
    private BoundingPoly invertSymmetricallyBy0Y(float centerX, BoundingPoly poly) {
        List<Vertex> vertices = poly.getVertices();
        for (Vertex v : vertices) {
            if (v.getX() != null) {
                v.setX((int) (centerX + (centerX - v.getX())));
            }
        }
        return poly;
    }
Weiweibel answered 18/10, 2017 at 15:33 Comment(0)
S
0

Usually we need to know the actual rotation angle of the text in the photo. The coordinate information provided in the API is complete enough. You only need to calculate the angle between xy1 and xy0 to get the rotation angle.

// reset
self.transform = CGAffineTransformIdentity;

CGFloat x_0 = viewData.bounds[0].x;
CGFloat y_0 = viewData.bounds[0].y;

CGFloat x_1 = viewData.bounds[1].x;
CGFloat y_1 = viewData.bounds[1].y;

CGFloat x_3 = viewData.bounds[3].x;
CGFloat y_3 = viewData.bounds[3].y;

// distance
CGFloat width = sqrt(pow(x_0 - x_1, 2) + pow(y_0 - y_1, 2));
CGFloat height = sqrt(pow(x_0 - x_3, 2) + pow(y_0 - y_3, 2));
self.size = CGSizeMake(width, height);

// angle
CGFloat angle = atan2((y_1 - y_0), (x_1 - x_0));
// rotation
self.transform = CGAffineTransformRotate(CGAffineTransformIdentity, angle);
Socratic answered 23/2, 2021 at 9:11 Comment(1)
Hi, answers are expected to be in English. I translated it for you via an online translator - please confirm that it still makes sense.Ayurveda
Z
0

v1 REST endpoint already has orientationDegrees in their response:

https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#Page

Unfortunately, google-cloud-vision 3.2.0 hasn't got this one yet https://github.com/googleapis/python-vision/issues/156

Zephyrus answered 8/6, 2021 at 15:47 Comment(0)
S
0

There is another OCR product by Google called document AI, which I believe is better suited for OCR on documents. It returns the orientations.

However, when I checked the JSON, it appears that the overall orientation might be incorrect, but the block orientations are correct. Here is the response I got for a page that was rotated 90 deg. clockwise (so the top of the page is to the right):

enter image description here

I would think one can take a majority vote of the block orientations to get the page orientation.

Slicer answered 7/8, 2023 at 21:23 Comment(0)
L
0

python code implementation for the logic above:

def getOrientation(responses):
vertexList = responses['textAnnotations'][1]['boundingPoly']['vertices']

ORIENTATION_NORMAL = 0
ORIENTATION_270_DEGREE = 270
ORIENTATION_90_DEGREE = 90
ORIENTATION_180_DEGREE = 180

centerX = 0
centerY = 0
for i in range(0,4):
    centerX += vertexList[i]['x']
    centerY += vertexList[i]['y']

centerX /= 4
centerY /= 4

x0 = vertexList[0]['x']
y0 = vertexList[0]['y']

if x0 < centerX:
    if y0 < centerY:
        #0 -------- 1
        #|          |
        #3 -------- 2
        return ORIENTATION_NORMAL
    else:
        #1 -------- 2
        #|          |
        #0 -------- 3
        return ORIENTATION_270_DEGREE
else:
    if y0 < centerY: 
        #3 -------- 0
        #|          |
        #2 -------- 1
        return ORIENTATION_90_DEGREE
    else:
        #2 -------- 3
        #|          |
        #1 -------- 0
        return ORIENTATION_180_DEGREE
Lewandowski answered 1/4, 2024 at 11:50 Comment(0)

© 2022 - 2025 — McMap. All rights reserved.