The point that minimizes the sum of euclidean distances to a set of n points
Asked Answered
S

3

8

I have a set of points W={(x1, y1), (x2, y2),..., (xn, yn)} on the 2D plane. Can you find an algorithm that takes these points as the input and returns a point (x, y) on the 2D plane which has the minimum sum of distances from the points in W? In other words, if

di = Euclidean_distance((x, y), (xi, yi))

I want to minimize:

d1 + d2 + ... + dn

Spooner answered 30/7, 2019 at 17:40 Comment(4)
At the risk of embarrassing myself with some not-thought-trough guess: Isn't that just the center of gravity?Zygoma
What search terms did you use that failed to find the known solutions?Ecumenism
You are looking for a geometric median.Diplomat
@Marco13: the center of gravity minimizes the sum of squared distances.Kindling
F
10

The Problem

You're looking for the geometric median.

An Easy Solution

There is no closed-form solution to this problem, so iterative or probabilistic methods are used. The easiest way to find this is probably with Weiszfeld's algorithm:

Weiszfeld's algorithm

We can implement this in Python as follows:

import numpy as np
from numpy.linalg import norm as npnorm
c_pt_old = np.random.rand(2)
c_pt_new = np.array([0,0])

while npnorm(c_pt_old-c_pt_new)>1e-6:
    num   = 0
    denom = 0
    for i in range(POINT_NUM):
        dist   = npnorm(c_pt_new-pts[i,:])
        num   += pts[i,:]/dist
        denom += 1/dist
    c_pt_old = c_pt_new
    c_pt_new = num/denom

print(c_pt_new)

There's a chance that Weiszfeld's algorithm won't converge, so it might be best to run it several times from different starting points.

A General Solution

You can also find this using second-order cone programming (SOCP). In addition to solving your specific problem, this general formulation then allows you to easily add constraints and weightings, such as variable uncertainty in the location of each data point.

To do so, you create a number of indicator variables representing the distance between the proposed center point and the data points.

You then minimize the sum of the indicator variables. The result follows

import cvxpy as cp
import numpy as np
import matplotlib.pyplot as plt

#Generate random test data
POINT_NUM = 100
pts       = np.random.rand(POINT_NUM,2)

c_pt      = cp.Variable(2)           #The center point we wish to locate
distances = cp.Variable(POINT_NUM)   #Distance from the center point to each data point

#Generate constraints. These are used to hold distances.
constraints = []                     
for i in range(POINT_NUM):
    constraints.append( cp.norm(c_pt-pts[i,:])<=distances[i] ) 

objective = cp.Minimize(cp.sum(distances))

problem = cp.Problem(objective,constraints)

optimal_value = problem.solve()

print("Optimal value = {0}".format(optimal_value))
print("Optimal location = {0}".format(c_pt.value))

plt.scatter(x=pts[:,0], y=pts[:,1], s=1)
plt.scatter(c_pt.value[0], c_pt.value[1], s=10)
plt.show()

SOCPs are available in a number of solvers including CPLEX, Elemental, ECOS, ECOS_BB, GUROBI, MOSEK, CVXOPT, and SCS.

I've tested and the two approaches give the same answers to within tolerance.

Weiszfeld, E. (1937). "Sur le point pour lequel la somme des distances de n points donnes est minimum". Tohoku Mathematical Journal. 43: 355–386.

Fuel answered 30/7, 2019 at 17:58 Comment(0)
D
0

If that point does not need to be from your sample, then the mean minimises the euclidean distance.

Depredate answered 7/2, 2023 at 23:1 Comment(0)
K
-1

A third method would be to use a compact nonlinear programming formulation. An unconstrained NLP model would be:

  min sum(i,  ||x-p(i)|| )

This has just 2 variables (the coordinates of x).

There is a very good initial point available. Let p(i,c) be the coordinates of the data points. Then the mean is

  m(c) = sum(i, p(i,c)) / n

where n is the number of data points. This point is often very close to the optimal value of x. So we can use m as an excellent initial point for x.

Some limited experiments indicate this approach is quite faster than a cone programming formulation for large n.

For details see Yet Another Math Programming Consultant - Finding the Central Point in a Point Cloud blog post.

Kerbstone answered 8/8, 2019 at 22:22 Comment(2)
Try to avoid putting critical information in links. (Awesome blog, though!)Fuel
Note the unconstrained version has a nonsmooth objective since the norm is differentialble in zero. That may or may not cause issues.Tripe

© 2022 - 2025 — McMap. All rights reserved.