word segmentation using opencv [closed]
Asked Answered
F

1

9

I am working on some scanned text images and I need to highlight all the words in that image.I know the problem is equivalent to finding subimages with extra whitespaces around them.

OCR cannot be used and I just need to outline each word with a border. Can someone suggest how it might be done using OpenCV.

I have tried reading about thresholding and segmenting.I am just looking for someone to point me to some relevant material.

Forgive answered 6/10, 2012 at 23:0 Comment(1)
i have tried reading about thresholding and segmenting.was just looking for someone to point me to relevant material rather than some codeForgive
H
21

I think your image has a multiline text. In that case, first you have to do is to detect these lines.

For that, first binarize the image using Otsu's method or adaptive thresholding.

Then,you can use something what is called as "Horizontal histogram". It is like a histogram itself, but shows where there are lines and where there are blank spaces. So devide the images at blank lines, and you get each line. Below is the image of a horizontal histogram.

Horizontal histogram

Now for each line, find horizontal histogram. Before that, try to do some dilatation and erosion, so that all letters are grouped together. Then you can find connected components on each line to get each word. Then draw boundaries.

Below image shows both horizontal and vertical histograms:

horizontal and vertical histograms

This SOF might help : How to convert an image into character segments?

Hinterland answered 8/10, 2012 at 16:42 Comment(7)
Hi Abid.Thanks for the response.I was implementing what you suggested.For calculating the histogram,I was trying to use OpenCV's calcHist method but the method returns a histogram where each intensity value is mapped to number of pixels having that intensity.Can you suggest how should I get the horizontal histogram as shown in your images above.Is there something in OpenCV related to this or should I implement something on my own.Forgive
I tried getting the kind of histogram you've shown by summing up values of pixel in each row.Is that the right way to do it?Forgive
yeah, second comment is right. Sum up pixel values in each row/column to get the histogram. It is not the histogram as in calcHist function.Hinterland
Hi.I was able to segment images.Can you suggest a good method to detect if 2 word images are similar.I am trying to use SIFT and BFMatcher but that doesn't seem to work so well for this case.Forgive
@AbidRahmanK Do you binarize the image first before you count the sum of pixel or you just sum the pixel of the gray image?Cottrell
What if the text lines are not strictly horizontal, how can you determine the orientation of the text line?Solution
@AbidRahmanK I got the horizontal and vertical projections, how do you "Draw the boundaries"?Oreste

© 2022 - 2024 — McMap. All rights reserved.