How to use OpenCV's connectedComponentsWithStats in Python?
Asked Answered
A

4

81

I am looking for an example of how to use OpenCV's connectedComponentsWithStats() function in Python. Note this is only available with OpenCV 3 or newer. The official documentation only shows the API for C++, even though the function exists when compiled for Python. I could not find it anywhere online.

Agace answered 7/3, 2016 at 21:16 Comment(1)
For insights on using the labels to mask the image etc, see Python OpenCV \- Connected Component Labeling and Analysis \- GeeksforGeeksEthbinium
A
147

The function works as follows:

# Import the cv2 library
import cv2
# Read the image you want connected components of
src = cv2.imread('/directorypath/image.bmp')
# Threshold it so it becomes binary
ret, thresh = cv2.threshold(src,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# You need to choose 4 or 8 for connectivity type
connectivity = 4  
# Perform the operation
output = cv2.connectedComponentsWithStats(thresh, connectivity, cv2.CV_32S)
# Get the results
# The first cell is the number of labels
num_labels = output[0]
# The second cell is the label matrix
labels = output[1]
# The third cell is the stat matrix
stats = output[2]
# The fourth cell is the centroid matrix
centroids = output[3]

Labels is a matrix the size of the input image where each element has a value equal to its label.

Stats is a matrix of the stats that the function calculates. It has a length equal to the number of labels and a width equal to the number of stats. It can be used with the OpenCV documentation for it:

Statistics output for each label, including the background label, see below for available statistics. Statistics are accessed via stats[label, COLUMN] where available columns are defined below.

  • cv2.CC_STAT_LEFT The leftmost (x) coordinate which is the inclusive start of the bounding box in the horizontal direction.
  • cv2.CC_STAT_TOP The topmost (y) coordinate which is the inclusive start of the bounding box in the vertical direction.
  • cv2.CC_STAT_WIDTH The horizontal size of the bounding box
  • cv2.CC_STAT_HEIGHT The vertical size of the bounding box
  • cv2.CC_STAT_AREA The total area (in pixels) of the connected component

Centroids is a matrix with the x and y locations of each centroid. The row in this matrix corresponds to the label number.

Agace answered 7/3, 2016 at 21:16 Comment(12)
I must say that for some reason, I had to use cv2.THRESH_BINARY instead of cv2.THRESH_BINARY+cv2.THRESH_OTSU, then I had to cast src to integer and thresh to float in order for it to work. I don't know why, but it didn't work otherwise.Afforest
I don't understand why you create the labels matrix when it is then part of the output anyway?Cerebritis
@Cerebritis You don't need to for connected components with stats, but do for connected components without stats. I think that part was just left over from me doing it the other way. I fixed it now. Cheers!Agace
Thanks so much for this! This is a much better description of how this works than the C++ docs have.Lyricist
can some one explain how to use the labels? How to check if a centroid is what label?Impetus
Each component in the image gets a number (label). The background is label 0, and the additional objects are numbered from 1 to num_labels-1. The centroids are indexed by the same numbers as the labels. centroids[0] isn't particularly useful--it's just the background. centroids[1:num_labels] is what you want.Canescent
@ZackKnopp Do you also know how I can order the labels by area, width or height?Grillo
@ZackKnopp That's incorrect, you can use the function without stats like this as well: _, labels = cv2.connectedComponents(segmentation) :)Sarcophagus
@Grillo You could create an array with the component areas: areas=output[2][:,4] Then an array with the numbers of components: nr=np.arange(output[0]) Then sort them according to area size: ranked=sorted(zip(areas,nr)) With help from here: #6619015Sarcophagus
cv2.connectedComponentsWithStats does not take connectivity as an input argument in OpenCV 3 or 4, and I don't think the function was present in 2. Is this simply a mixup between conectedComponentsWithStats and connectedComponentsWithStatsWithAlgorithm? output = cv2.connectedComponentsWithStats(thresh) gives the exact same result for me.Gerthagerti
Docs say ltype can be CV_32S or CV_16U - what do these do? I can't find ay documentation on their impactDislocation
Could someone please explain what I could use all of this for? I am trying to extract individual characters/text and landed here. I played around with the code above, it works, but how do I utilize it? I.e. How do I utilize centroids to find the centroids of the text?Shit
B
21

I have come here a few times to remember how it works and each time I have to reduce the above code to :

_, thresh = cv2.threshold(src,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
connectivity = 4  # You need to choose 4 or 8 for connectivity type
num_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(thresh , connectivity , cv2.CV_32S)

Hopefully, it's useful for everyone :)

Branny answered 13/11, 2018 at 19:57 Comment(0)
B
11

Adding to Zack Knopp answer, If you are using a grayscale image you can simply use:

import cv2
import numpy as np

src = cv2.imread("path\\to\\image.png", 0)
binary_map = (src > 0).astype(np.uint8)
connectivity = 4 # or whatever you prefer

output = cv2.connectedComponentsWithStats(binary_map, connectivity, cv2.CV_32S)

When I tried using Zack Knopp answer on a grayscale image it didn't work and this was my solution.

Bechance answered 5/3, 2018 at 14:48 Comment(0)
N
0

the input image needs to be single channel. so first convert to grayscale, otherwise it causes error in opencv 4.x you need to convert to grayscale and then the Zack's answer.

src = cv.cvtColor(src, cv.COLOR_BGR2GRAY)
Negotiate answered 5/10, 2022 at 7:42 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.