I need a spatial index in C
Asked Answered
P

4

8

I'm working on my gEDA fork and want to get rid of the existing simple tile-based system1 in favour of a real spatial index2.

An algorithm that efficiently finds points is not enough: I need to find objects with non-zero extent. Think in terms of objects having bounding rectangles, that pretty much captures the level of detail I need in the index. Given a search rectangle, I need to be able to efficiently find all objects whose bounding rectangles are inside, or that intersect, the search rectangle.

The index can't be read-only: gschem is a schematic capture program, and the whole point of it is to move things around the schematic diagram. So things are going to be a'changing. So while I can afford insertion to be a bit more expensive than searching, it can't be too much more expensive, and deleting must also be both possible and reasonably cheap. But the most important requirement is the asymptotic behaviour: searching should be O(log n) if it can't be O(1). Insertion / deletion should preferably be O(log n), but O(n) would be okay. I definitely don't want anything > O(n) (per action; obviously O(n log n) is expected for an all-objects operation).

What are my options? I don't feel clever enough to evaluate the various options. Ideally there'd be some C library that will do all the clever stuff for me, but I'll mechanically implement an algorithm I may or may not fully understand if I have to. gEDA uses glib by the way, if that helps to make a recommendation.

Footnotes:

1 Standard gEDA divides a schematic diagram into a fixed number (currently 100) of "tiles" which serve to speed up searches for objects in a bounding rectangle. This is obviously good enough to make most schematics fast enough to search, but the way it's done causes other problems: far too many functions require a pointer to a de-facto global object. The tiles geometry is also fixed: it would be possible to defeat this tiling system completely simply by panning (and possibly zooming) to an area covered by only one tile.

2 A legitimate answer would be to keep elements of the tiling system, but to fix its weaknesses: teaching it to span the entire space, and to sub-divide when necessary. But I'd like others to add their two cents before I autocratically decide that this is the best way.

Partheniaparthenocarpy answered 27/6, 2012 at 13:23 Comment(1)
Also check out boost.org/doc/libs/1_61_0/libs/geometry/doc/html/geometry/… And good luck on your project if it's still going!Procession
B
2

A nice data structure for a mix of points and lines would be an R-tree or one of its derivatives (e.g. R*-Tree or a Hilbert R-Tree). Given you want this index to be dynamic and serializable, I think using SQLite's R*-Tree module would be a reasonable approach.

If you can tolerate C++, libspatialindex has a mature and flexible R-tree implementation which supports dynamic inserts/deletes and serialization.

Bin answered 27/6, 2012 at 14:27 Comment(5)
Uhm, I don't need the index to be serializable in the sense of needing to write it to and load it from disk. Not sure if that's what you mean here.Partheniaparthenocarpy
Ah, I figured you'd be looking to save it out to disk with the file format for speed; perhaps that is just a bonus.Bin
No, definitely no saving of derived information. The file format is editable text (if a bit baroque), so I don't want to introduce a requirement for human editors to worry about a saved spatial index.Partheniaparthenocarpy
Then you can ignore those features and use these as "in memory" databases.Bin
Turns out I can apt-get install libspatialindex1, so that's a bonus. It seems to have C bindings too. Thanks for the reference to it; I was searching the package list too specifically (for r-tree and the like). I'll give that a try.Partheniaparthenocarpy
H
2

Your needs sound very similar to what is used in collision detection algorithms for games and physics simulations. There are several open source C++ libraries that handle this in 2-D (Box2D) or 3-D (Bullet physics). Although your question is for C, you may find their documentation and implementations useful.

Usually this is split into a two phases:

  1. A fast broad phase that approximates objects by their axis-aligned bounding box (AABB), and determines pairs of AABBs that touch or overlap.
  2. A slower narrow phase that calculates the points of geometric overlap for pairs of objects whose AABBs touch or overlap.

Physics engines also use spatial coherence to further reduce the pairs of objects that are compared, but this optimization probably won't help your application.

The broadphase is usually implemented with an O(N log N) algorithm like Sweep and prune. You may be able to accelerate this by using it in conjunction with the current tile approach (one of Nvidia's GPUGems describes this hybrid approach). The narrow phase is quite costly for each pair, and may be overkill for your needs. The GJK algorithm is often used for convex objects in this step, although faster algorithms exist for more specialized cases (e.g.: box/circle and box/sphere collisions).

Hobbes answered 27/6, 2012 at 14:25 Comment(1)
Thanks for adding nuance. As far as my question is concerned, the second, slower narrow phase doesn't exist. Approximating everything by it's AABB (and thanks for the jargon term) is quite alright.Partheniaparthenocarpy
B
2

A nice data structure for a mix of points and lines would be an R-tree or one of its derivatives (e.g. R*-Tree or a Hilbert R-Tree). Given you want this index to be dynamic and serializable, I think using SQLite's R*-Tree module would be a reasonable approach.

If you can tolerate C++, libspatialindex has a mature and flexible R-tree implementation which supports dynamic inserts/deletes and serialization.

Bin answered 27/6, 2012 at 14:27 Comment(5)
Uhm, I don't need the index to be serializable in the sense of needing to write it to and load it from disk. Not sure if that's what you mean here.Partheniaparthenocarpy
Ah, I figured you'd be looking to save it out to disk with the file format for speed; perhaps that is just a bonus.Bin
No, definitely no saving of derived information. The file format is editable text (if a bit baroque), so I don't want to introduce a requirement for human editors to worry about a saved spatial index.Partheniaparthenocarpy
Then you can ignore those features and use these as "in memory" databases.Bin
Turns out I can apt-get install libspatialindex1, so that's a bonus. It seems to have C bindings too. Thanks for the reference to it; I was searching the package list too specifically (for r-tree and the like). I'll give that a try.Partheniaparthenocarpy
D
0

This sounds to like an application well-suited to a quadtree (assuming you are interested only in 2D.) The quadtree is hierarchical (good for searching) and it's spatial resolution is dynamic (allowing higher resolution in areas that need it).

I've always rolled my own quadtrees, but here is a library that appears reasonable: http://www.codeproject.com/Articles/30535/A-Simple-QuadTree-Implementation-in-C

Disaccord answered 27/6, 2012 at 13:59 Comment(0)
N
-1

It is easy to do. It's hard to do fast. Sounds like a problem I worked on where there was a vast list of min,max values and given a value it had to return how many min,max pairs overlapped that value. You just have it in two dimensions. So you do it with two trees for each direction. Then do a intersection on the results. This is really fast.

#include <iostream>
#include <fstream>
#include <map>

using namespace std;

typedef unsigned int UInt;

class payLoad {
public:
    UInt    starts;
    UInt    finishes;
    bool    isStart;
    bool    isFinish;
    payLoad ()
    {
        starts = 0;
        finishes = 0;
        isStart = false;
        isFinish = false;
    }
};

typedef map<UInt,payLoad> ExtentMap;

//==============================================================================
class Extents
{
    ExtentMap   myExtentMap;

public:

    void ReadAndInsertExtents ( const char* fileName )
    {
        UInt start, finish;
        ExtentMap::iterator EMStart;
        ExtentMap::iterator EMFinish;

        ifstream efile ( fileName);
        cout << fileName << " filename" << endl;

        while (!efile.eof()) {
            efile >> start >> finish;
            //cout << start << " start " << finish << " finish" << endl;
            EMStart = myExtentMap.find(start);
            if (EMStart==myExtentMap.end()) {
                payLoad pay;
                pay.isStart = true;
                myExtentMap[start] = pay;
                EMStart = myExtentMap.find(start);
                }
            EMFinish = myExtentMap.find(finish);
            if (EMFinish==myExtentMap.end()) {
                payLoad pay;
                pay.isFinish = true;
                myExtentMap[finish] = pay;
                EMFinish = myExtentMap.find(finish);
            }
            EMStart->second.starts++;
            EMFinish->second.finishes++;
            EMStart->second.isStart = true;
            EMFinish->second.isFinish = true;

//          for (EMStart=myExtentMap.begin(); EMStart!=myExtentMap.end(); EMStart++)
//              cout << "| key " << EMStart->first << " count " << EMStart->second.value << " S " << EMStart->second.isStart << " F " << EMStart->second.isFinish << endl;

        }

        efile.close();

        UInt count = 0;
        for (EMStart=myExtentMap.begin(); EMStart!=myExtentMap.end(); EMStart++)
        {
                count += EMStart->second.starts - EMStart->second.finishes;
                EMStart->second.starts = count +  EMStart->second.finishes;
        }

//      for (EMStart=myExtentMap.begin(); EMStart!=myExtentMap.end(); EMStart++)
//          cout << "||| key " << EMStart->first << " count " << EMStart->second.starts << " S " << EMStart->second.isStart << " F " << EMStart->second.isFinish << endl;

    }

    void ReadAndCountNumbers ( const char* fileName )
    {
        UInt number, count;
        ExtentMap::iterator EMStart;
        ExtentMap::iterator EMTemp;

        if (myExtentMap.empty()) return;

        ifstream nfile ( fileName);
        cout << fileName << " filename" << endl;

        while (!nfile.eof()) 
        {
            count = 0;
            nfile >> number;
            //cout << number << " number ";

            EMStart = myExtentMap.find(number);
            EMTemp = myExtentMap.end();

            if (EMStart==myExtentMap.end()) {           // if we don't find the number then create one so we can find the nearest number.
                payLoad pay;
                myExtentMap[ number ] = pay;
                EMStart = EMTemp = myExtentMap.find(number);
                if ((EMStart!=myExtentMap.begin()) && (!EMStart->second.isStart)) 
                {
                    EMStart--;
                }
            }

            if (EMStart->first < number) {
                while (!EMStart->second.isFinish) {
                    //cout << "stepped through looking for end - key" << EMStart->first << endl;
                    EMStart++;
                    }
                if (EMStart->first >= number) {
                    count = EMStart->second.starts;
                    //cout << "found " << count << endl;
                    }
            }
            else if (EMStart->first==number) {
                count = EMStart->second.starts;
                }

            cout << count << endl;

            //cout << "| count " << count << " key " << EMStart->first << " S " << EMStart->second.isStart << " F " << EMStart->second.isFinish<< " V " << EMStart->second.value << endl;

            if (EMTemp != myExtentMap.end()) 
            {
                myExtentMap.erase(EMTemp->first);
            }
        }
        nfile.close();      
    }

};

//==============================================================================

int main (int argc,  char* argv[]) {
    Extents exts;

    exts.ReadAndInsertExtents ( "..//..//extents.txt" );
    exts.ReadAndCountNumbers ( "..//../numbers.txt" );

    return 0;
}

the extents test file was 1.5mb of:

0 200000
1 199999
2 199998
3 199997
4 199996
5 199995
....
99995 100005
99996 100004
99997 100003
99998 100002
99999 100001

The numbers file was like:

102731
104279
109316
104859
102165
105762
101464
100755
101068
108442
107777
101193
104299
107080
100958
.....

Even reading the two files from disk, extents were 1.5mb and numbers were 780k and the really large number of values and lookups, this runs in a fraction of a second. If in memory it would lightning quick.

Nelia answered 27/6, 2012 at 14:7 Comment(8)
In this intermediate stage of code gymnastics, I'm already doing it the brute force way, having eliminated tiles and simply iterating through the list of all objects. Already it's noticeably slow, even on modest schematic diagrams.Partheniaparthenocarpy
just added code above... this does one dimension... you'd do it for both for 2d and then intersect the results. You would need to add the deleting yourself but it is built on std C++ datastructures which already have this functionality.Nelia
std::map isn't a list, it's a tree! I suspect you got downvoted because a list is not an appropriate data structure for this sort of problem. Using std::map like you have is a lot more appropriate (bear in mind I need to do this in C, not C++), and what you've done here could be argued to be a spatial index in disguise!Partheniaparthenocarpy
trees are also a list.... I got down voted because I didn't have the code to hand as so could post it. the down votes were there before the code! Are you doing it in a C only compiler or doing it in C++ but written in C...Nelia
I'm just giving you the code I had already.... sure the source code to equivilent C MAP functionality is on the web somewhere. If it has to be pure C.Nelia
Oh no no trees are much more than a list. While you can find an item in even a sorted list (not an array) in O(n), you can do much better with a tree - generally O(log n). I really think it's because of that that someone (wasn't me!) voted you down. Don't worry about it. This project is written in C; the fact that it'd commonly be compiled with GCC which happens to also be able to compile C++, is basically irrelevant. (The source is largely C++-hostile anyway; while that isn't a big reason, it's simple to state.) And yes, I realize that a C version of std::map is just a matter of coding.Partheniaparthenocarpy
I think your getting a little carried away... I think I may know what a tree and a list are... all I was eluding to was a tree can represent a list if traversed correctly... I think your taking things a little literally.Nelia
Also the list I was originally refering to was clearly the list of values in the file.Nelia

© 2022 - 2024 — McMap. All rights reserved.