Stroke Width Transform (SWT) implementation (Python)

Asked 20/6, 2012 at 9:6 Answered 17/9, 2020 at 8:35

Solved python opencv computer-vision ocr simplecv

Can anyone describe how can i implement SWT in python using opencv or simplecv ?

Halliday answered 20/6, 2012 at 9:6 Comment(6)

if you are looking for a paper implementation, you could add a link to that paper, or atleast provide a link to what is SWT. – Sherly 20/6, 2012 at 9:19

may be useful : #4837624 – Sherly 20/6, 2012 at 9:24

i already saw them, but the link was broken,and no code or psudo code are available. – Halliday 20/6, 2012 at 9:42

that link definitely works: sites.google.com/site/roboticssaurav/strokewidthnokia – Vlissingen 21/6, 2012 at 6:44

@AruniRC,Thank you. please send your link as answer to i accept it. – Halliday 21/6, 2012 at 12:2

A matlab implementation (based on C++ mex file) can be found here. – Saccharin 14/11, 2013 at 7:9

Ok so here goes:

The link that has details on the implementation with the code download link at the bottom: SWT

For the sake of completeness, also mentioning that SWT or Stroke Width Transform was devised by Epshtein and others in 2010 and has turned out to be one of the most successful text detection methods til date. It does not use machine learning or elaborate tests. Basically after Canny edge detection on the input image, it calculates the thickness of each stroke that makes up objects in the image. As text has uniformly thick strokes, this can be a robust identifying feature.

The implementation given in the link is using C++, OpenCV and the Boost library they use for the connected graph traversals etc. after the SWT step is computed. Personally I've tested it on Ubuntu and it works quite well (and efficiently), though the accuracy is not exact.

Vlissingen answered 21/6, 2012 at 16:8 Comment(0)

I implemented something similar to the distance transform based SWT described in 'ROBUST TEXT DETECTION IN NATURAL IMAGES WITH EDGE-ENHANCED MAXIMALLY STABLE EXTREMAL REGIONS by Huizhong Chen, Sam S. Tsai, Georg Schroth, David M. Chen, Radek Grzeszczuk, Bernd Girod'.

It's not the same as described in the paper but a rough approximation that served my purpose. Thought I should share it so somebody might find it useful (and point out any errors/improvements). It is implemented in C++ and uses OpenCV.

    // bw8u : we want to calculate the SWT of this. NOTE: Its background pixels are 0 and forground pixels are 1 (not 255!)
    Mat bw32f, swt32f, kernel;
    double min, max;
    int strokeRadius;

    bw8u.convertTo(bw32f, CV_32F);  // format conversion for multiplication
    distanceTransform(bw8u, swt32f, CV_DIST_L2, 5); // distance transform
    minMaxLoc(swt32f, NULL, &max);  // find max
    strokeRadius = (int)ceil(max);  // half the max stroke width
    kernel = getStructuringElement(MORPH_RECT, Size(3, 3)); // 3x3 kernel used to select 8-connected neighbors

    for (int j = 0; j < strokeRadius; j++)
    {
        dilate(swt32f, swt32f, kernel); // assign the max in 3x3 neighborhood to each center pixel
        swt32f = swt32f.mul(bw32f); // apply mask to restore original shape and to avoid unnecessary max propogation
    }
    // swt32f : resulting SWT image

Omalley answered 3/8, 2014 at 15:3 Comment(6)

The local maxima of the Distance Transform will yield the half-stroke width. That's a nice observation, although some papers in 2011-2012 used this exact thing in combination with region detectors like MSERs. – Vlissingen 26/7, 2015 at 18:11

@Vlissingen The paper in the given link provides details about this method. Actually this half stroke-width thing is not my observation. Really sorry if my writing makes it look like that it's mine. All the credit of this should go to the authors of this paper. – Omalley 27/7, 2015 at 1:0

Oh, I did not mean it that way at all. Just an observation. And sorry, my fault for not seeing the ICIP paper link previously. In fact, using the distance transform to get the half-width is much more easy and elegant implementation-wise. Personally, I had used a Laplacian operator to get the local extrema of dist. trans. image, but your way of dilating is cleaner. – Vlissingen 28/7, 2015 at 20:18

The input image of your method must have a background value of zero and and a foreground value of one? How would I realize this? – Maddy 15/10, 2017 at 10:19

@BastianSchoettle We apply SWT to a binary image, so you can use threshold with maxval set to 1 to create this binary image. Just an example. – Omalley 15/10, 2017 at 11:58

Thank you...I was writing faster than thinking. I figured it out a couple minutes after posting the comment. – Maddy 15/10, 2017 at 12:5

There is a complete Library SWTloc here a Python 3 implementation of the algorithm

v2.0.0 onwards

Install the library

pip install swtloc

Transforming Image

import swtloc as swt
imgpath = 'images/path_to_image.jpeg'
swtl = swt.SWTLocalizer(image_paths=imgpath)
swtImgObj = swtl.swtimages[0]
swt_mat = swtImgObj.transformImage(text_mode='lb_df',
                                   auto_canny_sigma=1.0,
                                   maximum_stroke_width=20)

Localizing Letters

localized_letters = swtImgObj.localizeLetters(minimum_pixels_per_cc=100,
                                              maximum_pixels_per_cc=10_000,
                                              acceptable_aspect_ratio=0.2)

Localizing Words

localized_words = swtImgObj.localizeWords()

Full Disclosure : I am the author of this library

Fictionalize answered 17/9, 2020 at 8:35 Comment(0)

Install the library

Transforming Image

Localizing Letters

Localizing Words

Recommended topics

Hot tags