Improve template matching with many templates for one Image/ find characters on image

The idea: I have one screenshot and want to find all characters and numbers with its postition on this image. The easiest way is to use opencv match template and compare all characters (around 800) I have as ".png" to the screenshots.

myTemplatesPath = "C:/MyPath/Templates/"
allTemplateFiles = [os.path.join(root, name) for root, dirs, files in os.walk(myTemplatesPath) for name in files]
Templates_all = [cv2.imread(f, cv2.IMREAD_GRAYSCALE) for f in allTemplateFiles]   
imgrey = cv2.cvtColor(screenshot, cv2.COLOR_BGR2GRAY)
for template in Templates_all:       
   results = cv2.matchTemplate(imgrey, template, cv2.TM_CCOEFF_NORMED)
   results = np.where(results > 0.99)

Image:

Templates with different font sizes (just some examples):

, ,

This is working 100% fine. The only problem I have is the speed. It takes about 6s to find all positions in the image because it has to compare 800 templates with this 1 image. I would like to improve this time.

I had several ideas to improve this speed:

Use OCR -> to unreliable, did not recognize every character
Feature detection did not detect all characters like the large "L" had no features.
Devide image into ROIs by using find contours and then extract the width and height of those contours and then just compare the templates which fit the size of the extracted width and height. This would reduce the screen compares drastically and improve speed but finding contours lead to contours which devided the characters in 3 or more parts which would lead to incorrect height and width.

So I'm still searching for a good way to find the locations of the characters which is 100% reliable but faster. (I prefer idea number 3 but I'm open for every proposal)

I stumbled upon this question and I see there is no answer, so I will try to answer. Hopefully it will be useful to you or someone.

I had similar problem in the past and I used option 3. I had the problem you described of having multiple letters detected as one and I fixed that by checking first if the size of the region was in an acceptable range (all my letters/numbers had similar size) and if not I will try again to separate the letters using cv2.connectedComponents. This should work if there are not two letters 'touching' each other.

However this required a lot of fine-tunning to make it work 100% for my use case. My problem was not only performance, though, but failure to recognize some letters even with the pngs of all letters. Since you mentioned that you can recognize the letters already, maybe you can just recognize words first and then run your code for the words. I think you can easily detect words using dilation (morphological operation) and then run your code for each detected word. This should reduce the time to an acceptable range. If all images are like the one you provided, maybe you can just divide in 9 sub-regions and run your code.

Other optimizations I had to use that might be useful are:

If there is a limited number of words, try to cache the words once they are detected and then just detect words instead.
Another simple one is to divide the size of the image and your templates by 1/2 or even 1/10th of the original size. This will reduce drastically the time and in most cases there is enough information in the reduced image that it won't affect the template matching accuracy.
If you can make the image and templates binary (0 or 1), then you can have some nice optimizations to detect the template.

Recommended topics

Hot tags