KD/Qtree Implementation

Asked 24/5, 2017 at 1:45 Answered 24/5, 2017 at 8:15

python pandas dataframe cluster-computing quadtree

I have a following path data:

    id1            p1        p2
 0   1             7.935     5.103
 1   1             7.934     5.112
 2   1             7.936     5.102
 3   1             7.938     5.145
 4   2             7.930     5.191
 5   2             7.945     5.161
 6   2             7.954     5.127

In the above data frame, (p1,p2) forms the coordinate data and all the points belonging to the same "id1" forms one separate path; in the above df rows(0-3) belonging to id1 = 1 is one path and so on.

I am trying to implement Quadtree for the analysis of these trajectories. To implement Quadtrees I am trying to use "pyqtree" https://github.com/karimbahgat/Pyqtree python package.

In the code "len(spindex)" are the total number of items, while bounding box, "bbox" is in the format (xmin, ymin, xmax, ymax), "testitem" is the intersection bounding box, while len(matches) will give the number of nodes in the intersection.

I am trying to use the above df to implement quadtree. Please let me know how to use the above df as "items" in the code. And then how to give different bounding box for these trajectories. Also, how will I know or query the tree to find which trajectories are located in which area of the quadtree.

Unstep answered 24/5, 2017 at 1:45 Comment(0)

So you want to query the location of each trajectory, which means you need to calculate and insert the bbox for each. Ususally this type of data would have one row for each trajectory with a geometry field describing the sequence of xy coordinates. But since your coordinates go downwards we must do a workaround to first group all xy points belonging to each trajectory id and then calculate the bbox.

Here is untested example code to popoulate the index (my pandas is quite rusty so prob some mistakes in there):

for group in df.groupby('voygid'):
    bbox = [ group['x'].min(), group['y'].min(), group['x'].max(), group['y'].max() ]
    spindex.insert(group['voygid'][0], bbox)

Not really sure how you plan to cluster, which would be a separate question. The main purpose of a quadtree is not to ask which quad an item is located in, but rather to ask which items intersect with any arbitrary bbox region.

So if you divide your coordinate area into separate cluster regions you can then query which voygid trajectories are located in each.

for clusterbbox in clusters:
    voygids = spindex.intersects(clusterbbox)

Note that an item can span across and be located in multiple quads, so you may or may not need need additional fleshing out afterwards.

Phantasm answered 24/5, 2017 at 8:15 Comment(5)

Thank you so muchh. I will try what you suggested and get back to you if I need further help. Thanks again – Unstep 24/5, 2017 at 17:55

@Liza, I just went to look at the library, the code above does what you need. He did write the library after all. Of course he also wrote that sample code which could use a bit of work. But regardless, the loop he has above is functionally equivalent to the loop I used, and he is constructing the bounding box effectively for the trajectory, instead of the phony box from the example, which my code uses. – Platt 24/5, 2017 at 20:44

@Liza, not sure i fully understand so correct me if im wrong. Inserting objects into the index requires that you calculate each object's bbox. And if trajectories/voyage ids are what you would like returned as matches (as you say) then trajectory bboxes + voygid is what you have to insert. Thanks for helping clarify StephenRauch. – Phantasm 25/5, 2017 at 22:12

@KarimBahgat I am really sorry for dropping in again and again. Say I have 50 trajectories, each with some random number of points, and each coordinate/ points is identified by an ID indicating the trajectory it belongs to. Now, my aim is to give a target area and know how many trajectories it contains also can I identify those trajectories with their ID? I hope the question is clear. Please advice me if this is possible and how to proceed with it. – Unstep 27/5, 2017 at 4:33

Thats alright. What you ask for is exactly what the answer does. The code that populates the index inserts the trajectory ID (voygid) along with its bbox (barring any pandas error on my part). The code that queries clusterbbox (what you describe as "target area") returns the trajectory IDs (that we inserted earlier) located in that area. I noticed the readme or docs dont have many examples or mention of IDs, so maybe I will expand it in the future. Try to implement it in code, and post the error msg if it fails. See also the docs: pythonhosted.org/Pyqtree – Phantasm 27/5, 2017 at 10:39

To turn the trajectories into a list of items per voygid, we can use pandas.groupby:

Code:

def get_items(group):
    return [Item(row[1].x, row[1].y) for row in group.iterrows()]

voygids = dict(df.groupby('voygid').apply(get_items))

How?

The groupby will collect the rows associated with a particular voygid. Then we can use pandas.DataFrame.apply() to call a function which will return a list of items for the x, y pairs in the group. The function uses a list comprehension to construct the list of Items().

Test Code:

df = pd.read_fwf(StringIO(u"""
       voygid         x             y
        1             -7.935513     5.103579
        1             -7.935781     5.103300
        1             -7.936354     5.102726
        1             -7.935915     5.102802
        2             -7.935306     5.103424
        2             -7.945678     5.119876
        2             -7.954764     5.128738"""), header=1)
print(df)

class Item:
    def __init__(self, x, y):
      left = x-1
      right = x+1
      top = y-1
      bottom = y+1
      self.bbox = [left, top, right, bottom]

    def __repr__(self):
        return '[%s]' % ' '.join('%.4f' % x for x in self.bbox)

def get_items(group):
    return [Item(row[1].x, row[1].y) for row in group.iterrows()]

voygids = dict(df.groupby('voygid').apply(get_items))

for voygid, items in voygids.items():
    print(voygid)
    for item in items:
        print('  ' + repr(item))

Results:

   voygid         x         y
0       1 -7.935513  5.103579
1       1 -7.935781  5.103300
2       1 -7.936354  5.102726
3       1 -7.935915  5.102802
4       2 -7.935306  5.103424
5       2 -7.945678  5.119876
6       2 -7.954764  5.128738

1
  [-8.9355 4.1036 -6.9355 6.1036]
  [-8.9358 4.1033 -6.9358 6.1033]
  [-8.9364 4.1027 -6.9364 6.1027]
  [-8.9359 4.1028 -6.9359 6.1028]
2
  [-8.9353 4.1034 -6.9353 6.1034]
  [-8.9457 4.1199 -6.9457 6.1199]
  [-8.9548 4.1287 -6.9548 6.1287]

Platt answered 24/5, 2017 at 2:19 Comment(4)

This seems like it does something different. It groups the voygids, but never aggregates them? What's needed is one bbox for each trajectory, but here a bbox is calculated for every coordinate. Even if querying coordinate points was the issue, that should be done by inserting the same value for the min and max coordinate, not by creating an arbitrary bbox -/+1 around the point. – Phantasm 24/5, 2017 at 8:26

@KarimBahgat, The abitrary +/-1 bounding on a single x,y pair and the resultant list of same is directly from the code in the question. And the OP asks how to use the above df as "items" ? So while you will know way more about the algorithm than I ever will, the code presented does answer the question asked. – Platt 24/5, 2017 at 14:2

@StephenRauch Thank you for your help. As per your suggestion, the "voygids" will contain all the coordinates grouped by "voygid". But once I have all the data in "voygids", when I am implementing this piece of code-- for item in voygids: spindex.insert(item, item.bbox) It gives an Attribute Error saying: 'numpy.int64' object has no attribute 'bbox' – Unstep 24/5, 2017 at 20:26

@StephenRauch you're right, I should have seen that the Item +- bbox code was from the question, which in turn was from my test file. And indeed, the use of the Item class in the test code might be confusing, it's mostly just meant to set up a context for scenarios where a user might have objects with a bbox attribute. My main point was to clarify that the Item class is not necessary, and that in order to query entire trajectories one has to insert bbox for each trajectory rather than each xy coord. – Phantasm 25/5, 2017 at 22:46

Recommended topics

Hot tags