Approximate a group of line segments as a single best fit straight line
Asked Answered
F

2

6

Assuming I have a group of lines segments like the red lines (or green lines) in this picture Sample picture

I want to know how can I replace them with just one line segment that approximates them best. Or maybe you can advise what to search for since this might be a common problem in statistics.

Problem background: This comes actually from using OpenCV's probabilistic Hough Transform. I want to detect the corners of a piece of paper. When I apply it to an image I get on the edges a group of lines that I want to convert to a single line continuous segment.

One way to do it that I was thinking of, is to take from the line a couple of points and then to use line regression to get the equation of the line. Once I have that I should just cut it to a segment.

Footfall answered 18/8, 2014 at 17:13 Comment(2)
You can find solution in book: Trucco, Alessandro Verri, "Introductory Techniques for 3-D Computer Vision" at pages 114-116.Grained
haven't tried, but maybe fitLine already does the trick.Harquebus
I
3

Here's a potential solution:

  1. Obtain binary image. Load image, convert to grayscale, and Otsu's threshold

  2. Perform morphological operations. We morph close to combine the contours into a single contour

  3. Find convex hull. We create a blank mask then find the convex hull on the binary image. We draw the lines onto the mask, morph close, then find contours and fill in the outline to get a solid image

  4. Perform linear regression. We find the line of best fit on the binary image then draw this resulting line onto a new mask

  5. Bitwise convex hull and line mask together. We bitwise-and both masks together and draw this resulting contour onto the original image.


Here's a visualization of each step:

Using this screenshotted input image

enter image description here

Binary image -> Morph close

enter image description here enter image description here

# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Morph close
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)

Convex hull outline -> convex hull filled

enter image description here enter image description here

# Create line mask and convex hull mask
line_mask = np.zeros(image.shape, dtype=np.uint8)
convex_mask = np.zeros(image.shape, dtype=np.uint8)

# Find convex hull on the binary image
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnt = cnts[0]
hull = cv2.convexHull(cnt,returnPoints = False)
defects = cv2.convexityDefects(cnt,hull)
for i in range(defects.shape[0]):
    s,e,f,d = defects[i,0]
    start = tuple(cnt[s][0])
    end = tuple(cnt[e][0])
    far = tuple(cnt[f][0])
    cv2.line(convex_mask,start,end,[255,255,255],5)

# Morph close the convex hull mask, find contours, and fill in the outline
convex_mask = cv2.cvtColor(convex_mask, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
convex_mask = cv2.morphologyEx(convex_mask, cv2.MORPH_CLOSE, kernel, iterations=3)
cnts = cv2.findContours(convex_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cv2.fillPoly(convex_mask, cnts, (255,255,255))

Linear regression

enter image description here

# Perform linear regression on the binary image
[vx,vy,x,y] = cv2.fitLine(cnt,cv2.DIST_L2,0,0.01,0.01)
lefty = int((-x*vy/vx) + y)
righty = int(((image.shape[1]-x)*vy/vx)+y)
cv2.line(line_mask,(image.shape[1]-1,righty),(0,lefty),[255,255,255],2)

Bitwise-and filled convex hull and linear regression masks together

enter image description here

# Bitwise-and the line and convex hull masks together
result = cv2.bitwise_and(line_mask, line_mask, mask=convex_mask)
result = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)

Result

enter image description here

# Find resulting contour and draw onto original image
cnts = cv2.findContours(result, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cv2.drawContours(image, cnts, -1, (200,100,100), 3)

Here's the result with the other input image

enter image description here enter image description here

Full code

import cv2
import numpy as np

# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

# Morph close
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=3)

# Create line mask and convex hull mask
line_mask = np.zeros(image.shape, dtype=np.uint8)
convex_mask = np.zeros(image.shape, dtype=np.uint8)

# Find convex hull on the binary image
cnts = cv2.findContours(close, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnt = cnts[0]
hull = cv2.convexHull(cnt,returnPoints = False)
defects = cv2.convexityDefects(cnt,hull)
for i in range(defects.shape[0]):
    s,e,f,d = defects[i,0]
    start = tuple(cnt[s][0])
    end = tuple(cnt[e][0])
    far = tuple(cnt[f][0])
    cv2.line(convex_mask,start,end,[255,255,255],5)

# Morph close the convex hull mask, find contours, and fill in the outline
convex_mask = cv2.cvtColor(convex_mask, cv2.COLOR_BGR2GRAY)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
convex_mask = cv2.morphologyEx(convex_mask, cv2.MORPH_CLOSE, kernel, iterations=3)
cnts = cv2.findContours(convex_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cv2.fillPoly(convex_mask, cnts, (255,255,255))

# Perform linear regression on the binary image
[vx,vy,x,y] = cv2.fitLine(cnt,cv2.DIST_L2,0,0.01,0.01)
lefty = int((-x*vy/vx) + y)
righty = int(((image.shape[1]-x)*vy/vx)+y)
cv2.line(line_mask,(image.shape[1]-1,righty),(0,lefty),[255,255,255],2)

# Bitwise-and the line and convex hull masks together
result = cv2.bitwise_and(line_mask, line_mask, mask=convex_mask)
result = cv2.cvtColor(result, cv2.COLOR_BGR2GRAY)

# Find resulting contour and draw onto original image
cnts = cv2.findContours(result, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cv2.drawContours(image, cnts, -1, (200,100,100), 3)

cv2.imshow('thresh', thresh)
cv2.imshow('close', close)
cv2.imshow('image', image)
cv2.imshow('line_mask', line_mask)
cv2.imshow('convex_mask', convex_mask)
cv2.imshow('result', result)
cv2.waitKey()
Isleen answered 25/1, 2020 at 2:58 Comment(0)
A
1

This is a great question. The right way to look at this is to integrate the error along each line segment. Instead of a simple term (y[i] - y_hat)^2 (where y_hat is the predicted value from the regression line) you should have integral((y[i](t) - y_hat)^2, t, 0, 1) where y[i](t) is parametrization of the line segment, y[i](t) = t * y[i, 1] + (1 - t)*y[i, 0] (denoting the endpoints of the i-th segment as y[i, 0] and y[i, 1]). I think you'll find that you can compute the integral exactly and therefore you'll get terms for the sum of squared errors which involve just the endpoints. I've left out some details but I think that's enough for you to work out the rest. EDIT: Error terms should be squared; I've adjusted formulas accordingly.

2ND EDIT: I worked out some formulas (using Maxima). For a regression line represented by y = alpha*x + beta, I get as the least squares estimates for alpha and beta:

alpha = (4*('sum(ll[k],k,1,n))*'sum(xx[k][2]*yy[k][2]*ll[k],k,1,n)
    +2*('sum(ll[k],k,1,n))*'sum(xx[k][1]*yy[k][2]*ll[k],k,1,n)
    +('sum(xx[k][2]*ll[k],k,1,n))*(-3*'sum(yy[k][2]*ll[k],k,1,n)
                                  -3*'sum(yy[k][1]*ll[k],k,1,n))
    +('sum(xx[k][1]*ll[k],k,1,n))*(-3*'sum(yy[k][2]*ll[k],k,1,n)
                                  -3*'sum(yy[k][1]*ll[k],k,1,n))
    +2*('sum(ll[k],k,1,n))*'sum(yy[k][1]*xx[k][2]*ll[k],k,1,n)
    +4*('sum(ll[k],k,1,n))*'sum(xx[k][1]*yy[k][1]*ll[k],k,1,n))
    /(4*('sum(ll[k],k,1,n))*'sum(xx[k][2]^2*ll[k],k,1,n)
     +4*('sum(ll[k],k,1,n))*'sum(xx[k][1]*xx[k][2]*ll[k],k,1,n)
     -3*('sum(xx[k][2]*ll[k],k,1,n))^2
     -6*('sum(xx[k][1]*ll[k],k,1,n))*'sum(xx[k][2]*ll[k],k,1,n)
     +4*('sum(ll[k],k,1,n))*'sum(xx[k][1]^2*ll[k],k,1,n)
     -3*('sum(xx[k][1]*ll[k],k,1,n))^2)

  beta = -((2*'sum(xx[k][2]*ll[k],k,1,n)+2*'sum(xx[k][1]*ll[k],k,1,n))
   *'sum(xx[k][2]*yy[k][2]*ll[k],k,1,n)
   +('sum(xx[k][2]*ll[k],k,1,n)+'sum(xx[k][1]*ll[k],k,1,n))
    *'sum(xx[k][1]*yy[k][2]*ll[k],k,1,n)
   +('sum(xx[k][2]^2*ll[k],k,1,n))*(-2*'sum(yy[k][2]*ll[k],k,1,n)
                                   -2*'sum(yy[k][1]*ll[k],k,1,n))
   +('sum(xx[k][1]*xx[k][2]*ll[k],k,1,n))
    *(-2*'sum(yy[k][2]*ll[k],k,1,n)-2*'sum(yy[k][1]*ll[k],k,1,n))
   +('sum(xx[k][1]^2*ll[k],k,1,n))*(-2*'sum(yy[k][2]*ll[k],k,1,n)
                                   -2*'sum(yy[k][1]*ll[k],k,1,n))
   +('sum(xx[k][2]*ll[k],k,1,n)+'sum(xx[k][1]*ll[k],k,1,n))
    *'sum(yy[k][1]*xx[k][2]*ll[k],k,1,n)
   +2*('sum(xx[k][1]*yy[k][1]*ll[k],k,1,n))*'sum(xx[k][2]*ll[k],k,1,n)
   +2*('sum(xx[k][1]*ll[k],k,1,n))*'sum(xx[k][1]*yy[k][1]*ll[k],k,1,n))
   /(4*('sum(ll[k],k,1,n))*'sum(xx[k][2]^2*ll[k],k,1,n)
    +4*('sum(ll[k],k,1,n))*'sum(xx[k][1]*xx[k][2]*ll[k],k,1,n)
    -3*('sum(xx[k][2]*ll[k],k,1,n))^2
    -6*('sum(xx[k][1]*ll[k],k,1,n))*'sum(xx[k][2]*ll[k],k,1,n)
    +4*('sum(ll[k],k,1,n))*'sum(xx[k][1]^2*ll[k],k,1,n)
    -3*('sum(xx[k][1]*ll[k],k,1,n))^2)

where xx and yy are lists of pairs of values, one pair for each line segment. That is, xx[k] are the x-coordinates for the endpoints of the k-th segment, yy[k] are the y-coordinates for the endpoints of the k-th segment, and ll[k] is the length sqrt((xx[k][2] - xx[k][1])^2 + (yy[k][2] - yy[k][1])^2) of the k-th segment.

Here is my program to derive those formulas. Probably there are other reasonable ways to set up this problem which yield similar but different formulas.

y_hat[k](l) := alpha * x[k](l) + beta;
x[k](l) := (1 - l/ll[k]) * xx[k][1] + l/ll[k] * xx[k][2];
y[k](l) := (1 - l/ll[k]) * yy[k][1] + l/ll[k] * yy[k][2];
e[k]:= y[k](l) - y_hat[k](l);
foo : sum (integrate (e[k]^2, l, 0, ll[k]), k, 1, n);
declare (nounify (sum), linear);
[da, db] : [diff (foo, alpha), diff (foo, beta)];
map (expand, %);
bar : solve (%, [alpha, beta]);

Here are some example data and the result I get. I postponed defining dx, dy, and ll because, since they are all constant terms, I didn't want them to be expanded in the solutions for alpha and beta.

dx[k] := xx[k][2] - xx[k][1];
dy[k] := yy[k][2] - yy[k][1];
ll[k] := sqrt (dx[k]^2 + dy[k]^2);

xx : [[1,2],[1.5,3.5],[5.5,10.5]]$
yy : [[1,2.2],[1.5,3.3],[5,12]]$

''bar, nouns, n=length(xx);
  => [[alpha = 1.133149837130799, beta = - 0.4809409869515073]]
Amygdala answered 18/8, 2014 at 17:29 Comment(2)
Thanks for the answer Robert, but to be honest I don't know how to use it. I'm interested in implementing it in java, and what I have are the A(x1, y1) and B(x2, y2) points for each line. As I understand your answer, I need to parametrize the line and then calculate the integral... I also tried with regression line and it works to some extent, but I think it can be done better (and maybe easier).Footfall
@Footfall Yes, you need an integral. Maybe it is easier to think about breaking up the line into many points, and adding each point to your data set, and calculating the limit in which the number of points increases without bound -- that limit is the integral. I can work out a formula, maybe tomorrow.Amygdala

© 2022 - 2024 — McMap. All rights reserved.