Find the diameter of a set of n points in d-dimensional space
Asked Answered
L

4

5

I am interesting in finding the diameter of two points sets, in 128 dimensions. The first has 10000 points and the second 1000000. For that reason I would like to do something better than the naive approach which takes O(n²). The algorithm will be able to handle any number of points and dimensions, but I am currently very interested in these two particular data sets.

I am very interesting in gaining speed over accuracy, thus, based on this, I would find the (approximate) bounding box of the point set, by computing the min and max value per coordinate, thus O(n*d) time. Then, if I find the diameter of this box, the problem is solved.

In the 3d case, I could find the diameter of the one side, since I know the two edges and then, I could apply the Pythagorean theorem on the other, which is vertical to this side. I am not sure for this however and for sure, I can't see how to generalize it to d dimensions.


An interesting answer can be found here, but it seems to be specific for 3 dimensions and I want a method for d dimensions.

Interesting paper: On computing the diameter of a point set in high dimensional Euclidean space. Link. However, implementing the algorithm seems too much for me in this phase.

Lowrey answered 11/12, 2014 at 1:47 Comment(1)
general solution requires computing the convex hull #2736790Drugi
P
4

The classic 2-approximation algorithm for this problem, with running time O(nd), is to choose an arbitrary point and then return the maximum distance to another point. The diameter is no smaller than this value and no larger than twice this value.

Pampero answered 11/12, 2014 at 3:2 Comment(13)
So you mean picking a point at random, finding it's exact farthest NN and that's it. Do you have any link that proves your final sentence?Lowrey
@G.Samaras The lower bound is obvious. The upper bound is one line with the triangle inequality.Pampero
Yeah that makes sense. However, what I described above is O(n*d) and if computing the diameter is not very costly, it should be a fair candidate over your approach, in terms of the trade of of accuracy and speed, right?Lowrey
@G.Samaras As I understand it, the competing algorithm is to find the minimum axis-aligned bounding box and compute its diameter. This approach suffers badly from the curse of dimensionality with respect to accuracy.Pampero
I see your point. However, let me wait a bit to see if any other answer arrives. :) Thank you very much. +1Lowrey
An optional extension: Pick random point x. Pick point y furthest from x. Pick point z furthest from y. The distance between y and z can than be used in your bound instead.Grenada
@TimothyShields I thought that when I woke up! :O By bound you mean the diameter, right?Lowrey
@G.Samaras Yes, you can use d(y,z) as the variable in your bounds instead of d(x,y). It won't give you better worst-case bounds, I believe, but it should give you a better distribution on your diameter bound samples.Grenada
@TimothyShields, can you explain maybe how I can prove that claim? About the bounds. If you feel you need to make another answer, feel free, since I hope we don't disturb David much in his answer. :)Lowrey
@G.Samaras It's more or less the same proof. Let D = d(p,q) be the diameter. Then d(y,z) ≤ D (since p,q is the maximum argument of d(*,*)), and D = d(p,q) ≤ d(p,y) + d(y,q) ≤ 2d(y,z) (triangle inequality and since z is the maximum argument of d(y,*) = d(*,y)).Pampero
@DavidEisenstat Right, the worst-case bounds don't improve at all. It's more an improvement on the distribution of the values you get out of the procedure.Grenada
@G.Samaras In particular, the worst-case example has all pairs of points except one at distance 1; that one is at distance 2. Unless we examine the distance for that specific pair, we'll never know. In practice, choosing y then z should hopefully bias the choice to the periphery.Pampero
Thank you both! @TimothyShields, you deserve some credit too, I am going to give a +1 to an answer of yours I like. :)Lowrey
S
3

I would like to add a comment, but not enough reputation for that...

I just want to warn other readers that the "bounding box" solution is very inaccurate. Take for example the Euclidean ball of radius one. This set has diameter two, but its bounding box is [-1, 1]^d, which has diameter twice the square root of d. For d = 128, this is already a very bad approximation.

For a crude estimate, I would stay with David Eisenstat's answer.

Stem answered 26/1, 2017 at 11:46 Comment(0)
K
1

There is a precision based algorithm which performs very well on any dimension, which is based on computing the dimension of an axial bounding box.

The idea is that it's possible to find the lower and upper boundaries of the axis bounding box length function since it's partial derivatives are limited, and depend on the angle between the axises.

The limit of the local maxima derivatives between two axises in 2d space can be computed as:

sin(a/2)*(1 + tan(a/2)) That means that, for example, for 90deg between axises the boundary is 1.42 (sqrt(2))

Which reduces to a/2 when a => 0, so the upper boundary is proportional to the angle.

For a multidimensional case the formula varies slightly, but still it's easy to compute.

So, the search of local minima convolves in logarithmic time.

The good news is that we can run the search of such local maxima in parallel. Also, we can filter out both the regions of the search based on the best achieved result so far, as well as the points themselves, which are belo the lower limit of the search in the worst region.

The worst case of the algorithm is where all of the points are placed on the surface of a sphere.

This can be firther improved: when we detect a local search which operates on just few points, we swap to bruteforce for this particular axis. It works fast, because we need only the points which are subject to that particular local search, which can be determined as points actually bound by two opposite spherical cones of a particular angle sharing the same axis.

It's hard to figure out the big O notation, because it depends on desired precision and the distribution of points (bad when most of the points are on a sphere's surface).

The algorithm i use is here:

  1. Set the initial angle a = pi/2.
  2. Take one axis for each dimension. The angle and the axises form the initial 'bucket'
  3. For each axis, compute the span on that axis by projecting all the points onto the axis, and finding min and max of the coordinates on the axis.
  4. Compute the upper and lower bounds of the diameter which is interesting. It's based on the formula: sin(a/2)*(1 + tan(a/2)) and multiplied by assimetry cooficient, computed from the length of the current axis projections.
  5. For the next step, kill all of the points which fall under the lower bound in each dimension at the same time.
  6. For each exis, If the amount of points above the upper bound is less then some reasonable amount (experimentally computed) then compute using a bruteforce (N^2) on the set of the points in question, and adjust the lower bound, and kill the axis for the next step.
  7. For the next step, Kill all of the axises, which have all of their points under the lower bound.
  8. If the precision is satisfactory (upper bound - lower bound) < epsilon, then return the upper bound as the result.
  9. For all of the survived axises, there is a virtual cone on that axis (actually, the two opposite cones), which covers some area on a virtual sphere which encloses a face of the cube. If i'm not mistaken, it's angle would be a * sqrt(2). Set the new angle to a / sqrt(2). Create a whole bucket of new axises (2 * number of dimensions), so the new cone areas would cover the initial cone area. It's the hard part for me, as i have not enough imagination for n>3-dimensional case.
  10. Continue from step (3).

You can paralellize the procedure, synchronizing the limits computed so far for the points from (5) through (7).

Klepht answered 18/2, 2015 at 12:4 Comment(0)
D
1

I'm going to summarize the algorithm proposed by Timothy Shields.

  1. Pick random point x.
  2. Pick point y furthest from x.
  3. If not done, let x = y, and go to step 2

The more times you repeat, the more accurate the result will be... ??

EDIT: actually this algorithm is not very good. Think about a 2D rectangle with vertices ABCD. There are two maxima: between AC and BD, which are separated by a sizable valley. This algorithm will get stuck at one or the other 50/50. If AC is slightly larger than BD, you'll be getting the wrong answer 50% of the time no matter how many times you iterate. Other regular polygons have the same issue, and in higher dimensions it is even worse.

Drugi answered 20/7, 2022 at 14:22 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.