Scene Text Image Super-Resolution for OCR

Asked 12/11, 2020 at 17:41 Answered 18/4, 2022 at 3:54

python opencv ocr generative-adversarial-network text-recognition

I am working on an OCR system. A challenge that I'm facing for recognizing the text within ROI is due to the shakiness or motion effect shot or text that is not focus due to angle positions. Please consider the following demo sample

If you notice the texts (for ex. the mark as a red), in such cases the OCR system couldn't properly recognize the text. However, this scenario can also come on with no angle shot where the image is too blurry that the OCR system can't recognize or partially recognize the text. Sometimes they are blurry or sometimes very low resolution or pixelated. For example

Methods we've tried

Firstly we've tried various methods available on SO. But sadly no luck.

Next, we've tried the following three most promising methods as below.

1.TSRN

A recent research work (TSRN) mainly focuses on such cases. The main intuitive of it is to introduce super-resolution (SR) techniques as pre-processing. This implementation looks by far the most promising. However, it fails to do magic on our custom dataset (for example the second images above, the blue text). Here are some example from their demonstration:

2. Neural Enhance

After looking at its illustration on its page, we believed it might work. But sadly it also couldn't address the problem. However, I was a bit confusing even with their showed example because I couldn't reproduce them too. I've raised an issue on github where I demonstrated this more in detail. Here are some example from their demonstration:

3. ISR

The last choice with minimum hope with this implementation. No luck either.

Update 1

[Method]: Apart from the above, we also tried some traditional approaches such as Out-of-focus Deblur Filter (Wiener filter and also unsupervised Weiner filter). We also checked the Richardson-Lucy method. but no improvement with this approach either.
[Method]: We’ve checked out a GAN based DeBlur solution. DeblurGAN I have tried this network. What attracted me was the approach of the Blind Motion Deblurring mechanism.

Lastly, from this discussion we encounter this research work which seems really good enough. Didn't try this yet.

Update 2

[Method]: Real-World Super-Resolution via Kernel Estimation and Noise Injection Tried this method. Promising. However, didn't work in our case. Code.
[Method]: Photo Restoration Comparative to the above all methods, it performs the best surprisingly in super text resolution for OCR. It greatly removes noise, blurriness, etc., and makes the image much clearer and which enhance model generalization better. Code.

My Query

Is there any effective workaround to tackle such cases? Any methods that could improve such blurry or low-resolution pixels whether the texts are in front or far away due to the camera angle?

Borehole answered 12/11, 2020 at 17:41 Comment(6)

This seems like a problem about out of focus. You may check this and this – Geelong 13/11, 2020 at 7:32

I've tried Out-of-focus Deblur Filter but sadly didn't perform well enough. I've used skimage.restoration.wiener and also skimage.restoration.unsupervised_wiener. :( – Borehole 13/11, 2020 at 19:19

The problem you are facing is a non-uniform defocus due to changing distance from the oblique perspective in the image. So a simple defocus processing will not work as that requires a uniform defocus. – Appointee 17/11, 2020 at 16:36

Point. But what if I crop a specific blurry portion (underlined red mark for example? – Borehole 18/11, 2020 at 11:25

Out of scope question. What OCR architecture is your work based on? Maybe there is some inherent network structure we can improve upon on – Tiannatiara 1/4, 2021 at 9:21

I'm not continuing the project. Last time I used MaskTextSpotterV3 and ABC net. – Borehole 1/4, 2021 at 10:27

Currently, there is one solution Real-World Super-Resolution via Kernel Estimation and Noise Injection. The author proposes a degradation framework RealSR, which provides realistic images for super-resolution learning. It is a promising method for shakiness or motion effect images super-resolution.

The method is divided into two stages. The first stage Realistic Degradation for Super-Resolution

is to estimate the degradation from real data and generate realistically LR images.

The second stage Super-Resolution Model

is to train the SR model based on the constructed data.

You can look at this Github article: https://github.com/jixiaozhong/RealSR

Expanse answered 17/11, 2020 at 8:12 Comment(2)

Thanks for the info. I will check it out. Meanwhile would you please explain what makes you think that this approach is strongly applicable for such cases? I am a bit confuse actually as it seems to me only another solution that I've already mentioned. – Borehole 17/11, 2020 at 12:53

OK, I've quickly tried with this in windows in both blurry image and pixelated but sadly no improvement at all. :( – Borehole 17/11, 2020 at 13:11

I've also been working on this super-resolution field and found some promising results but haven't tried yet, first paper (license plate base text) they implement the image enhancement first then do the super-resolution in a later stage. second paper and github in this paper they use text prior to guide the super-resolution network.

Circumscissile answered 18/4, 2022 at 3:54 Comment(2)

and I want to ask you something, is there any reference to opening up the LMDB dataset from the TSNR paper? I have no luck in opening them until now – Circumscissile 18/4, 2022 at 4:16

While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - From Review – Bedside 21/4, 2022 at 12:7

Methods we've tried

Update 1

Update 2

My Query

Recommended topics

Hot tags