Is there a way to compare two geometric shapes (or any two more generic data structures), without using the brute force when a tolerance is involved?
The brute force (that is comparing each value of each object against each value of the other object) works but it's slow, and I can't use it.
I tried sorting the data and comparing two sorted collections. It's fast, but it only works with zero tolerance. As soon as I add the tolerance I get lost. The problem is that two values can be identical when I compare and different when I sort.
Here are some details of my problem.
In my Excel VBA add-in I have a collection of Shape
objects made by a collection of Line
objects made by two Point
objects each. The add-in scans a CAD drawing via COM and creates the collection of Shape
objects.
An simplified version could generate this:
Shape 1 Shape 2
Point 1 0.0 5.0 0.0 4.9
Point 2 4.9 0.0 5.1 0.0
Point 3 5.0 5.0 5.0 5.0
I need to find which shapes are identical to which shapes, where identical means has the same shape, size and orientation, but not the same position (so far it's trivial) plus or minus a tolerance (not so trivial now!)
The Point.IsIdenticalTo(OtherPoint)
is defined as:
Function IsIdenticalTo(OtherPoint As Point) As Boolean
IsIdenticalTo = Abs(X - OtherPoint.X) < Tolerance And Abs(Y - OtherPoint.Y) < Tolerance
End Function
The brute force implementation of the Shape.IsIdenticalTo(OtherShape)
works but it's too slow: if each Line(I)
has an identical OtherShape.Line(J)
and viceversa, then the two shapes are identical. Sometimes I have hundreds of shapes with hundreds of lines each, so the brute force solution doesn't work for me.
I tried two approaches involving sorted collections. Both are fast because comparing two sorted collections is faster than the brute force way, but both fail in some conditions:
A
SortedValues
collection contains all the X and Y values of all the lines. The values are sorted, so the info about whether a value is an X or a Y is lost. I have used this approach for months without problems, but it fails for example when the only difference between two shapes is between the points(10, 20)
and(20, 10)
. I added the line angle to the list of values, things have improved, but there are still cases where this approach fails, because some info is lost when the values are sorted together. In the example above this approach would work with the following collections:Shape 1 Shape 2 0.0 0.0 0.0 0.0 4.9 4.9 5.0 5.0 5.0 5.0 5.0 5.1
A
SortedLines
collection contains all the lines sorted counter-clockwise and starting from the point closest to the origin. This approach doesn't lose any info, but it fails in the example above because the sorting algorithm doesn't agree with the equality comparison. If the tolerance is 0.5 they should be identical, but the sorting algorithm produces the collections shown below. Things get more difficult because my shapes contain sub-shapes, so there are many starting points on each shape.Shape 1 Shape 2 Point 1 4.9 0.0 0.0 4.9 Point 2 5.0 5.0 5.1 0.0 Point 3 0.0 5.0 5.0 5.0
EDIT:
Shapes are imported from an external graphical application via COM. A shape can be as simple as rectangle or as complex as any fancy outline with 10 circles inside, 20 internal shapes and 30 lines. They represent panels with holes and simple decorations, and sometimes they have a saw-tooth shape, which makes dozen of edges.
y
values if thex
values are comparable. – Insufficient