Limiting the detection area in Google Vision, text recognition

Asked 7/3, 2020 at 14:48 Answered 22/1, 2021 at 17:42

java android android-camera google-vision

I have been searching the whole day for a solution. I've checked out several Threads regarding my problem.

But it didn't help me a lot. Basically I want that the Camera Preview is fullscreen but text only gets recognized in the center of the screen, where a Rectangle is drawn.

Technologies I am using:

Google Mobile Vision API’s for Optical character recognition(OCR)
Dependecy: play-services-vision

My current state: I created a BoxDetector class:

public class BoxDetector extends Detector {
    private Detector mDelegate;
    private int mBoxWidth, mBoxHeight;

    public BoxDetector(Detector delegate, int boxWidth, int boxHeight) {
        mDelegate = delegate;
        mBoxWidth = boxWidth;
        mBoxHeight = boxHeight;
    }

    public SparseArray detect(Frame frame) {
        int width = frame.getMetadata().getWidth();
        int height = frame.getMetadata().getHeight();
        int right = (width / 2) + (mBoxHeight / 2);
        int left = (width / 2) - (mBoxHeight / 2);
        int bottom = (height / 2) + (mBoxWidth / 2);
        int top = (height / 2) - (mBoxWidth / 2);

        YuvImage yuvImage = new YuvImage(frame.getGrayscaleImageData().array(), ImageFormat.NV21, width, height, null);
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        yuvImage.compressToJpeg(new Rect(left, top, right, bottom), 100, byteArrayOutputStream);
        byte[] jpegArray = byteArrayOutputStream.toByteArray();
        Bitmap bitmap = BitmapFactory.decodeByteArray(jpegArray, 0, jpegArray.length);

        Frame croppedFrame =
                new Frame.Builder()
                        .setBitmap(bitmap)
                        .setRotation(frame.getMetadata().getRotation())
                        .build();

        return mDelegate.detect(croppedFrame);
    }

    public boolean isOperational() {
        return mDelegate.isOperational();
    }

    public boolean setFocus(int id) {
        return mDelegate.setFocus(id);
    }

    @Override
    public void receiveFrame(Frame frame) {
        mDelegate.receiveFrame(frame);
    }
}

And implemented an instance of this class here:

final TextRecognizer textRecognizer = new TextRecognizer.Builder(App.getContext()).build();

// Instantiate the created box detector in order to limit the Text Detector scan area
BoxDetector boxDetector = new BoxDetector(textRecognizer, width, height);

//Set the TextRecognizer's Processor but using the box collider

boxDetector.setProcessor(new Detector.Processor<TextBlock>() {
    @Override
    public void release() {
    }

    /*
        Detect all the text from camera using TextBlock
        and the values into a stringBuilder which will then be set to the textView.
    */
    @Override
    public void receiveDetections(Detector.Detections<TextBlock> detections) {
        final SparseArray<TextBlock> items = detections.getDetectedItems();
        if (items.size() != 0) {

            mTextView.post(new Runnable() {
                @Override
                public void run() {
                    StringBuilder stringBuilder = new StringBuilder();
                    for (int i = 0; i < items.size(); i++) {
                        TextBlock item = items.valueAt(i);
                        stringBuilder.append(item.getValue());
                        stringBuilder.append("\n");
                    }
                    mTextView.setText(stringBuilder.toString());
                }
            });
        }
    }
});


    mCameraSource = new CameraSource.Builder(App.getContext(), boxDetector)
            .setFacing(CameraSource.CAMERA_FACING_BACK)
            .setRequestedPreviewSize(height, width)
            .setAutoFocusEnabled(true)
            .setRequestedFps(15.0f)
            .build();

On execution this Exception is thrown:

Exception thrown from receiver.
java.lang.IllegalStateException: Detector processor must first be set with setProcessor in order to receive detection results.
    at com.google.android.gms.vision.Detector.receiveFrame(com.google.android.gms:play-services-vision-common@@19.0.0:17)
    at com.spectures.shopendings.Helpers.BoxDetector.receiveFrame(BoxDetector.java:62)
    at com.google.android.gms.vision.CameraSource$zzb.run(com.google.android.gms:play-services-vision-common@@19.0.0:47)
    at java.lang.Thread.run(Thread.java:919)

If anyone has a clue, what my fault is or has any alternatives I would really appreciate it. Thank you!

This is what I want to achieve, a Rect. Text area scanner:

Redstart answered 7/3, 2020 at 14:48 Comment(3)

how you fix this issue? – Nonbelligerent 9/6, 2020 at 7:3

@ShwetaChauhan Sadly, I wasn't able to do so – Redstart 20/6, 2020 at 14:46

How can I restrict it if someone using an overlay on our app. https://mcmap.net/q/139097/-restrict-overlay-in-android/4134215 – Alexio 16/7, 2023 at 20:16

Google vision detection have the input is a frame. A frame is an image data and contain a width and height as associated data. U can process this frame (Cut it to smaller centered frame) before pass it to the Detector. This process must be fast and do along camera processing image. Check out my Github below, Search for FrameProcessingRunnable. U can see the frame input there. u can do the process yourself there.

CameraSource

Rumpf answered 13/3, 2020 at 3:37 Comment(3)

Hello, first of all thanks for answering! I saw your code and wondered, what do I have to change in my code? Is the only thing I have to add is the Frame processing part? (The 2 private classes)? – Redstart 14/3, 2020 at 7:44

Yes, U have to modify your frame before u pass it to the last operation of Detector: mDetector.receiveFrame(outputFrame); – Ical 16/3, 2020 at 6:43

Can you edit your answer with the code I need to add, so that I can code it out and award you the bountie? – Redstart 16/3, 2020 at 13:5

You can try to pre-parse the CameraSource feed as @'Thành Hà Văn' mentioned (which I myself tried first, but discarded after trying to adjust for the old and new camera apis) but I found it easier to just limit your search area and use the detections returned by the default Vision detections and CameraSource. You can do it in several ways. For example,

(1) limiting the area of the screen by setting bounds based on the screen/preview size
(2) creating a custom class that can be used to dynamically set the detection area

I chose option 2 (I can post my custom class if needed), and then in the detection area, I filtered it for detections only within the specified area:

                for (j in 0 until detections.size()) {
                    val textBlock = detections.valueAt(j) as TextBlock
                    for (line in textBlock.components) {                        
                        if((line.boundingBox.top.toFloat()*hScale) >= scanView.top.toFloat() && (line.boundingBox.bottom.toFloat()*hScale) <= scanView.bottom.toFloat()) {
                            canvas.drawRect(line.boundingBox, linePainter)
                            
                            if(scanning)
                                if (((line.boundingBox.top.toFloat() * hScale) <= yTouch && (line.boundingBox.bottom.toFloat() * hScale) >= yTouch) &&
                                    ((line.boundingBox.left.toFloat() * wScale) <= xTouch && (line.boundingBox.right.toFloat() * wScale) >= xTouch) ) {                                    
                                    acceptDetection(line, scanCount)
                                }
                        }
                    }
                }

The scanning section is just some custom code I used to allow the user to select what detections they wanted to keep. You would replace everything inside the if(line....) loop with your custom code to only act on the cropped detection area. Note, this example code only crops vertically, but you could also drop horizontally as well, and both directions also.

Redwine answered 22/1, 2021 at 17:42 Comment(0)

-1

In google-vision you can get the coordinates of a detected text like described in How to get position of text in an image using Mobile Vision API?

You get the TextBlocks from TextRecognizer, then you filter the TextBlock by their coordinates, that can be determined by the getBoundingBox() or getCornerPoints() method of TextBlocks class :

TextRecognizer

Recognition results are returned by detect(Frame). The OCR algorithm tries to infer the text layout and organizes each paragraph into TextBlock instances. If any text is detected, at least one TextBlock instance will be returned.

[..]

Public Methods

public SparseArray<TextBlock> detect (Frame frame) Detects and recognizes text in a image. Only supports bitmap and NV21 for now. Returns mapping of int to TextBlock, where the int domain represents an opaque ID for the text block.

source : https://developers.google.com/android/reference/com/google/android/gms/vision/text/TextRecognizer

TextBlock

public class TextBlock extends Object implements Text

A block of text (think of it as a paragraph) as deemed by the OCR engine.

Public Method Summary

Rect getBoundingBox() Returns the TextBlock's axis-aligned bounding box.

List<? extends Text> getComponents() Smaller components that comprise this entity, if any.

Point[] getCornerPoints() 4 corner points in clockwise direction starting with top-left.

String getLanguage() Prevailing language in the TextBlock.

String getValue() Retrieve the recognized text as a string.

source : https://developers.google.com/android/reference/com/google/android/gms/vision/text/TextBlock

So you basically proceed like in How to get position of text in an image using Mobile Vision API? however you do not split any block in lines and then any line in words like

//Loop through each `Block`
            foreach (TextBlock textBlock in blocks)
            {
                IList<IText> textLines = textBlock.Components; 

                //loop Through each `Line`
                foreach (IText currentLine in textLines)
                {
                    IList<IText>  words = currentLine.Components;

                    //Loop through each `Word`
                    foreach (IText currentword in words)
                    {
                        //Get the Rectangle/boundingBox of the word
                        RectF rect = new RectF(currentword.BoundingBox);
                        rectPaint.Color = Color.Black;

                        //Finally Draw Rectangle/boundingBox around word
                        canvas.DrawRect(rect, rectPaint);

                        //Set image to the `View`
                        imgView.SetImageDrawable(new BitmapDrawable(Resources, tempBitmap));


                    }

                }
            }

instead you get the boundary box of all text blocks and then select the boundary box with the coordinates closest to the center of the screen/frame or the rectangle that you specify (i.e. How can i get center x,y of my view in android?) . For this you use the getBoundingBox() or getCornerPoints() method of TextBlocks ...

Desantis answered 23/3, 2020 at 13:33 Comment(1)

I tried it but I didn't know how to implement it correctly – Redstart 6/4, 2020 at 14:57

Recommended topics

Hot tags