Joining unordered line segments

Asked 16/9, 2009 at 23:57 Answered 10/10, 2023 at 7:22

My algorithm produces a list of (usually) several thousand line segments (all 2D) which I need to join up into large polylines. These resulting polylines might be closed or open, but they are never self-intersecting. The line segments are not directed, i.e. it might be needed to flip a line segment before it can be joined to its neighbour.

What would be an extremely fast way of finding these polylines? I have to do this in real-time, so anything that takes longer than -say- 10ms is not a solution.

I'm doing this in C#, but I'm looking for ideas, not source.

Magnificence answered 16/9, 2009 at 23:57 Comment(1)

did you manage to solve this? Out of curiosity: where you using the AutoCAD .net API? – Cultus 9/2, 2023 at 6:55

Would a hash of endpoints work?

If the endpoints match exactly then you can just store each object twice in a hash, once by each endpoint. Then, for each object, look up both its end points. You will get any other objects that need to be joined.

If the endpoints have any kind of imprecision, then you will need a spatial index, and probably one that uses an R-tree. You can get a similar effect by just making a 2d hash grid. The hash grid contains buckets of nearby endpoints. You will need to check neighboring cells.

Brakeman answered 17/9, 2009 at 0:13 Comment(9)

Agreed. Perhaps he's having difficulty linearizing the sequence of connected points. That's easy; just store each undirected segment twice (once for each endpoint) and visit the point you didn't come from. – Massotherapy 17/9, 2009 at 0:19

Oh, definitely, I wanted each object stored twice, once for each endpoint. Good point, I should have said that in so many words. – Brakeman 17/9, 2009 at 0:27

It might work. I was just wondering if there are specialised algorithms that handle this particular problem very well. The endpoints match exactly, I luckily do not have to deal with tolerance issues. ps. I'll be sure to put them in twice :) – Magnificence 17/9, 2009 at 0:39

It should be possible to go a lot faster than this if there are some helpful invariants. For example, is the ordering of joinable segments in any way constrained? – Brakeman 17/9, 2009 at 1:2

My 2D space is divided into rectangular cells, and each cell contains 0, 1 or 2 segments. The end-points of each segment are on the cell border. – Magnificence 17/9, 2009 at 2:41

Oh yeah, I populate the grid in a nested loop (first columns, then rows). – Magnificence 17/9, 2009 at 2:42

I suppose a sort of marching squares approach for finding nearby segments might work faster than building a hash table (ps. sorry for putting up so many comments in a row). – Magnificence 17/9, 2009 at 2:43

Oh, heh, you already have a 2D grid. It does seem like you know for sure which cells are the only ones that could possibly contain a matching end, and that you have, at some point, a handle on both the cell and the segment at the same time. Does the structure of the program allow you to join the line segments at that point? – Brakeman 17/9, 2009 at 2:56

@DigitalRoss, did you find an alternative optimized solution or did you stick to the "store twice in hash"-method? – Arietta 5/4, 2015 at 21:53

Here is some working C++ code for joining vectors (i.e., sequences) of values, joining those with identical end values, reversing the sequences as necessary. I have tested it on thousands of polylines representing roads and other linear features in my map library package, CartoType, as a way of reducing the number of separate lines rendered by the GPU. It can be improved, no doubt, but this code is working and tested.

The performance is, as far as I can see, O(N log(N)); that is, far better than quadratic. The code searches the std::map once or twice per vector, and inserts the new vector's end points. The std::map::find and insert functions are logarithmic in the size of the container.

You supply vectors (which can represent polylines or anything else) one by one, and the algorithm is based on the idea of a map from vector end points to vectors. At any time, each end point is associated with one vector; that is an invariant. If a vector is supplied that duplicates one or two existing end points, it is joined to existing vectors, preserving the invariant, and it is cleared.

After running the joiner by calling VectorJoiner::Join on every vector, you can discard any vectors which have a size of zero.

If using the joiner on a vector of polylines, it's vital to create that vector before calling VectorJoiner::Join on the elements, so that element addresses, which are stored in the std::map, are stable. It is no doubt relatively simple to create a version using array indexes which doesn't suffer from that limitation.

Although this code is part of CartoType it is hereby released under the MIT license.

/*
cartotype_vector_joiner.h
Copyright (C) 2023 CartoType Ltd.
See www.cartotype.com for more information.
*/

#pragma once

#include <vector>
#include <map>

/**
Joins two vectors that share start or end members.

Joins aSource to this aDest by joining it at one end or other if possible and adding the members of aSource to aDest.
Reverses the added members if necessary. Returns true if the join was possible.

Does nothing if either vector is empty.
*/
template<typename T> bool JoinVectors(std::vector<T>& aDest,const std::vector<T>& aSource)
    {
    if (aDest.empty() || aSource.empty() || &aDest == &aSource)
        return false;

    if (aDest.back() == aSource.front())
        {
        aDest.insert(aDest.end(),aSource.begin() + 1,aSource.end());
        return true;
        }
    if (aDest.back() == aSource.back())
        {
        aDest.insert(aDest.end(),aSource.rbegin() + 1,aSource.rend());
        return true;
        }
    if (aDest.front() == aSource.back())
        {
        aDest.insert(aDest.begin(),aSource.begin(),aSource.end() - 1);
        return true;
        }
    if (aDest.front() == aSource.front())
        {
        aDest.insert(aDest.begin(),aSource.rbegin(),aSource.rend() - 1);
        return true;
        }

    return false;
    }

/**
A class to join two vectors that share start or end members.
Use it by creating all the vectors, then calling Join once on each vector.
After that, use only those vectors which are not empty.
*/

template<typename T> class VectorJoiner
    {
    public:
    void Join(std::vector<T>& aVector)
        {
        if (aVector.empty())
            return;

        std::vector<T>* cur_vector = &aVector;
        for (int pass = 0; pass < 2; pass++)
            {
            auto iter = m_end_map.find(cur_vector->front());
            if (iter == m_end_map.end())             // not found
                iter = m_end_map.find(cur_vector->back());
            if (iter == m_end_map.end())             // not found
                break;

            auto found_vector = iter->second;
            m_end_map.erase(found_vector->front());
            m_end_map.erase(found_vector->back());
            JoinVectors(*found_vector,*cur_vector);

            cur_vector->clear();
            cur_vector = found_vector;
            }

        m_end_map[cur_vector->front()] = cur_vector;
        m_end_map[cur_vector->back()] = cur_vector;
        }

    private:
    std::map<T,std::vector<T>*> m_end_map;
    };

Puttier answered 10/10, 2023 at 7:22 Comment(7)

isn't this a quadratic algorithm? have you tested its asymptotic behaviour? – Gregale 12/10, 2023 at 12:33

I haven't voted on your answer. Maybe add your notice into the answer itself? – Gregale 17/10, 2023 at 14:13

you mention "element addresses, which are stored in the std::map" does this mean that the comparison done when searching the map is not by value, but by address, i.e. uses pointer-equality? if so, who's to say that e.g. two separate X-Y points (1,1) and (1,1) will be considered the same? even if so, this means this approach can't be used for joining with a tolerance, isn't it? could you please clarify these points for me? – Gregale 18/10, 2023 at 8:7

Will, if you look at the way the std::map is used you'll see that the map is from the values at the start and end of the vectors to the addresses: it is std::map<T,std::vector<T>*>; so no address comparison is done. All that is required of the class T is that it should have an equality operator; that equality operator is used to compare the X-Y points, or whatever the T actually is. – Puttier 18/10, 2023 at 15:33

great, thanks for clarifying this. :) --- you might even say, "far better better than quadratic". – Gregale 20/10, 2023 at 17:54

Will, thanks for your latest edit, but it might be a good idea to move on to something else. It is starting to feel as if you are looking over my shoulder, which is off-putting. – Puttier 26/10, 2023 at 9:12

This is as much for yours and mine benefit as is for any future reader's. If the edit improves a post, I don't see a reason why it shouldn't be made or why is should be interpreted as anything personal. Best wishes. :) – Gregale 26/10, 2023 at 13:18

-1

A slight variation that would work for lines that need to be joined if they are close enough, say the ends closer than some distance dclose, is to create a grid of cell size less than dclose, and then hash the cell number (x,y,z), all integers, of each end. In Python this is a dictionary of (x, y,z) as keys and a reference to the line as value. In C++, this is a map. Dclose should be made smaller than the shortest line segment, so no single line has the same cell for both ends.

Selfrenunciation answered 2/2, 2023 at 1:5 Comment(1)

Hey Munther Hindi, welcome to Stack Overflow and thank you for your answer! The community would appreciate if you would update your answer to include more concrete steps (probably a code snippet?) which people can follow to use the approach you've described in your answer. – Diabolic 7/2, 2023 at 21:32

Would a hash of endpoints work?

Recommended topics

Hot tags