How to find the smallest ellipse covering a given fraction of a set of points in R?
Asked Answered
G

2

11

I'm wondering: Is there some function/clever way to find the smallest ellipse covering a given fraction of a set of 2d points in R? With smallest I mean the ellipse with the smallest area.

Clarification: I'm fine with an approximately correct solution if the number of points are large (as I guess an exact solution would have to try all combinations of subsets of points)

This question might sound like a duplicate of the question Ellipse containing percentage of given points in R but the way that question is phrased the resulting answer does not result in the smallest ellipse. For example, using the solution given to Ellipse containing percentage of given points in R:

require(car)
x <- runif(6)
y <- runif(6)
dataEllipse(x,y, levels=0.5)

The resulting ellipse is clearly not the smallest ellipse containing half of the points, Which, I guess, would be a small ellipse covering the three points up in the top-left corner.

enter image description here

Greiner answered 7/11, 2014 at 21:6 Comment(10)
If you don't have an algorithm to implement that does what you want, then you should get one, maybe at crossvalidated.com, then ask how to implement it here.Problematic
You can find the algorithm here with a quick google search: gis.stackexchange.com/questions/22562/…Horrible
@Problematic I would think this was an "algorithmic" problem rather than a "statistical" one. And surely there must exist an algorithm already?Bases
@Horrible No, that is not the algorithm I'm looking for. It is for finding the smallest ellipse covering 100% of the points. I'm looking for the smallest ellipse covering p % of the points where p could be any percentage.Bases
@RasmusBååth if you can find the smallest ellipse covering 100% percent of X points, you can find N smallest ellipses that cover 100% of p%*X points.Horrible
@Bernardo: yes, but exhaustively enumerating subsets comprising N points until you find the smallest ellipse might not be practical!Forseti
@BenBolker Agreed, but OP haven't even provided a algorithm (thus, this post is off-topic and should be migrated to stack exchange). A non-optimal solution is better than noneHorrible
@Horrible Hmmm, yeah but that's not really practical if you have to try all permutations of the subsets of points, right? (Unless you have very few points). I would be fine with an approximate algorithm, and have clarified this in the question.Bases
could you use k-means clustering to identify promising sets of points?Forseti
@BenBolker I guess you could use k-means as a way of reducing the dimensionality. By calculating the minimum ellipse of the centroids you could get a good guess what the ellipse should be, perhaps?Bases
G
4

I think I have a solution which requires two functions, cov.rob from the MASS package and ellipsoidhull from the cluster package. cov.rob(xy, quantile.used = 50, method = "mve") finds approximately the "best" 50 points out of the total number of 2d points in xy that are contained in the minimum volume ellipse. However, cov.rob does not directly return this ellipse but rather some other ellipse estimated from the best points (the goal being to robustly estimate the covariance matrix). To find the actuall minimum ellipse we can give the best points to ellipsoidhull which finds the minimum ellipse, and we can use predict.ellipse to get out the coordinates of the path defining the hull of the elllipse.

I'm not 100% certain this method is the easiest and/or that it works 100% (It feels like it should be possible to avoid the seconds step of using ellipsoidhull but I havn't figured out how.). It seems to work on my toy example at least....

Enough talking, here is the code:

library(MASS)
library(cluster)

# Using the same six points as in the question
xy <- cbind(x, y)
# Finding the 3 points in the smallest ellipse (not finding 
# the actual ellipse though...)
fit <- cov.rob(xy, quantile.used = 3, method = "mve")
# Finding the minimum volume ellipse that contains these three points
best_ellipse <- ellipsoidhull( xy[fit$best,] )
plot(xy)
# The predict() function returns a 2d matrix defining the coordinates of
# the hull of the ellipse 
lines(predict(best_ellipse), col="blue")

enter image description here

Looks pretty good! You can also inspect the ellipse object for more info

best_ellipse
## 'ellipsoid' in 2 dimensions:
##  center = ( 0.36 0.65 ); squared ave.radius d^2 =  2 
##  and shape matrix =
##         x      y
## x 0.00042 0.0065
## y 0.00654 0.1229
##   hence, area  =  0.018 

Here is a handy function which adds an ellipse to an existing base graphics plot:

plot_min_ellipse <- function(xy, points_in_ellipse, color = "blue") {
  fit <- cov.rob(xy, quantile.used = points_in_ellipse, method = "mve")
  best_ellipse <- ellipsoidhull( xy[fit$best,] )
  lines(predict(best_ellipse), col=color)
}

Let's use it on a larger number of points:

x <- runif(100)
y <- runif(100)
xy <- cbind(x, y)
plot(xy)
plot_min_ellipse(xy, points_in_ellipse = 50)

enter image description here

Greiner answered 8/11, 2014 at 0:10 Comment(1)
Should add that the approximate algorithm can be pretty "approximative", so to speak. If you have many (> 1000) points you might want to run the algorithm for longer than the standard settings (which has nsamp = 3000). For example: fit <- cov.rob(xy, quantile.used = 3, method = "mve", nsamp = 10000) This will take a couple of seconds though...Bases
A
0

This sounds very much like a 2D confidence interval. Try http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/ellipsoidhull.html. You will probably need to run it on each combination of N points, then choose the smallest result.

Aggressor answered 7/11, 2014 at 21:16 Comment(3)
No, this is not directly related to confidence intervals. The like you propose leads to an algorithm used for finding the smallest ellipse covering 100% of the points. I'm looking for the smallest ellipse covering p % of the points where p could be any percentage.Bases
OP has dataEllipse in their original example -- doesn't do what they're looking forForseti
I meant, take each combination of N points where N= p/100 * total, then picking the smallest. But yes, this will be slow when there are many points.Aggressor

© 2022 - 2024 — McMap. All rights reserved.