Neo4j Spatial 'WithinDistance' Cypher query returns empty while REST call returns data
Asked Answered
C

2

8

I have what appears to be a correctly configured spatial layer and index and can successfully query a node using findGeometriesWithinDistance REST API call.

POST /db/data/ext/SpatialPlugin/graphdb/findGeometriesWithinDistance {"layer":"geom","pointX":15.0,"pointY":60.0,"distanceInKm":100.0}

However, when querying using cypher, I get no results (I have tried reversing the order of 60.0 and 15.0 without luck):

START n=node:geom('withinDistance:[60.0, 15.0, 500.0]') return n;

Cyper returns:

==> +---+
==> | n |
==> +---+
==> +---+
==> 0 row
==> 
==> 13 ms

REST:

200 OK
==> [ {
==>   "paged_traverse" : "http://localhost:7474/db/data/node/14472/paged/traverse/{returnType}{?pageSize,leaseTime}",
==>   "outgoing_relationships" : "http://localhost:7474/db/data/node/14472/relationships/out",
==>   "data" : {
==>     "lon" : 15.2,
==>     "bbox" : [ 15.2, 60.1, 15.2, 60.1 ],
==>     "RaceName" : "Parador Es Muy Caliente",
==>     "lat" : 60.1,
==>     "gtype" : 1
==>   },
==>   "all_typed_relationships" : "http://localhost:7474/db/data/node/14472/relationships/all/{-list|&|types}",
==>   "traverse" : "http://localhost:7474/db/data/node/14472/traverse/{returnType}",
==>   "self" : "http://localhost:7474/db/data/node/14472",
==>   "all_relationships" : "http://localhost:7474/db/data/node/14472/relationships/all",
==>   "property" : "http://localhost:7474/db/data/node/14472/properties/{key}",
==>   "properties" : "http://localhost:7474/db/data/node/14472/properties",
==>   "outgoing_typed_relationships" : "http://localhost:7474/db/data/node/14472/relationships/out/{-list|&|types}",
==>   "incoming_relationships" : "http://localhost:7474/db/data/node/14472/relationships/in",
==>   "incoming_typed_relationships" : "http://localhost:7474/db/data/node/14472/relationships/in/{-list|&|types}",
==>   "extensions" : {
==>   },
==>   "create_relationship" : "http://localhost:7474/db/data/node/14472/relationships"
==> } ]

REST Calls to reproduce: Create Layer:

POST /db/data/ext/SpatialPlugin/graphdb/addSimplePointLayer { "layer":"geom", "lat":"lat", "lon":"lon" }

Create Index:

POST /db/data/index/node/ {"name":"geom", "config":{"provider":"spatial", "geometry_type":"point","lat":"lat","lon":"lon"}}

Create Node:

POST /db/data/node {"lat":60.2,"lon":15.1,"RaceName":"Parador Es Muy Caliente"}

(In response, examine "self" and find nodeid)

Index the node:

POST /db/data/ext/SpatialPlugin/graphdb/addNodeToLayer {"layer":"geom", "node":"http://localhost:7474/db/data/node/###NEW_NODE_ID###"}

Find:

POST /db/data/ext/SpatialPlugin/graphdb/findGeometriesWithinDistance {"layer":"geom","pointX":15.0,"pointY":60.0,"distanceInKm":100.0}
Conga answered 31/7, 2013 at 9:34 Comment(0)
K
7

I investigated this, and it is related to an issue we have seen a few times. There is an inconsistency in the design of the spatial library in that there are two ways to add a node to the spatial index. The one is to add it to the Layer (using the addNodeToLayer REST call), and this uses the underlying Java API which directly connects the node into the RTree as part of the same graph. The other is to create a proxy node in the index graph so that your domain graph is not connected to the index graph. This second approach is only taken by the IndexProvider interface (using the /db/data/index/node/geom REST call).

If you call both methods, the node is added twice, once directly and once by proxy. The problem is that the Cypher withinDistance index query accesses only the IndexProvider interface, and will only return nodes that are NOT also connected to the index. So if you add the node in both ways, it will not be returned.

So you need to add it only one of the two ways. I did not see in your original email any mention of addNodeToLayer, so I suspect that SDN might be calling addNodeToLayer (perhaps Michael can comment), in which case you cannot use the cypher call.

During my testing, I was able to manually remove the one index relationship using Cypher like this:

START n=node(13065) MATCH (n)<-[r:RTREE_REFERENCE]-() DELETE r

Replace the number 13065 with your node id for the original node.

I did the following in the neo4j browser (in 2.1.2):

:POST /db/data/ext/SpatialPlugin/graphdb/addSimplePointLayer { "layer":"geom", "lat":"lat", "lon":"lon" }
:POST /db/data/index/node/ {"name":"geom", "config":{"provider":"spatial", "geometry_type":"point","lat":"lat","lon":"lon"}}
:POST /db/data/node {"lat":60.2,"lon":15.1,"RaceName":"Parador Es Muy Caliente"}
:POST /db/data/index/node/geom {"value":"dummy","key":"dummy", "uri":"http://localhost:7474/db/data/node/13071"}

This created a graph with the node not directly connect to the index. In this case the REST call 'findGeometriesWithinDistance' does not work (uses standard Java API), while the cypher 'withinDistance' does work. I tested with this command:

start n = node:geom("withinDistance:[60.2,15.1,100.0]") return n

Note that unfortunately this API puts the order as lat,lon, instead of the more standard lon,lat.

Then I also added to the layer (ie. add directly to the index graph):

:POST /db/data/ext/SpatialPlugin/graphdb/addNodeToLayer {"layer":"geom", "node":"http://localhost:7474/db/data/node/13071"}

Now when I search with the cypher command I still get the same correct answer, but when I search with the REST command:

:POST /db/data/ext/SpatialPlugin/graphdb/findGeometriesWithinDistance {"layer":"geom","pointX":15.0,"pointY":60.0,"distanceInKm":100.0}

I find this returns the proxy node instead of the original node.

Kos answered 22/8, 2014 at 11:57 Comment(4)
That explains why it works only after restarting my database! Indeed,initially, I wasn't aware of the lat/lon inversion in the cypher usage, so I wondered: "What if I add the node to the index manually too" => ending up with both indexes, preventing the good usage EVEN if I had used withinDistance in the right way. After restart, I kept one index (without doing anything so) and called the api in the good way. Thanks :)Pelham
I've just tested the REST call, and it well returns some items. In my case, both usage (REST and Cypher) works out-of -the-box using SDN 3.1.2. I see links to the RTREE_REFERENCE, but its not the domain nodes, just nodes containing the wkt information and the id of the real domain nodes.Pelham
I did a bit more testing, and if I called the addNodeToLayer method and the index/node/geom method in a different order I got different results. If adding to the index first, both search methods work (although the findGeometriesWithinDistance returns the proxy). If I add to the layer first, then only the findGeometriesWithinDistance works. I've not investigated further why. This still supports the idea that it is best to use only one approach consistently.Kos
Jim Biard reports that if you give your nodes an id=id property, then both methods work OK, but then if you remove a node from the index it is deleted, so you have to remove the id property first. I've not tested this.Kos
B
4

this is a bug, see https://github.com/neo4j/spatial/issues/106 if you want, feel free to investigate, seems to be the iteration in SpatialRecordHits.java!

Meanwhile, make sure to add the node to the index before querying via the index, as that creates the proper node structure.

Bloxberg answered 31/7, 2013 at 16:8 Comment(11)
Thanks for your help Peter - as you pointed out in email, the correct solution was to add the node to an index, rather than just the layer: POST /db/data/index/node/geom {"value":"dummy","key":"dummy", "uri":"http://localhost:7474/db/data/node/1234"} Once I did that, Cypher query works perfectly: start n = node:Places('withinDistance:[15.0, 60.0, 100.0]') return n; Not sure how I missed that in the test, but thanks again for pointing it out.Conga
Troy, After discussing with Michael, you can search via Cypher if you add that node to the INDEX, not only to the layer. In that scenario, you will be able to find your node with Cypher as there is the right information created in the indexing process. We are contemplating to add an index configuration that covers the use case you have presently, but you should be ok if you before your Cypher search add the node to an index, see github.com/neo4j/spatial/blob/master/src/test/java/org/neo4j/… /peterConga
Peter - One question that I have is whether or not I actually need to add the Node to Layer. It seems that the spacial query works with or without that step. Can you give me or point me to any insights on that? TroyConga
Yes, The index step both adds the domain node to the layer, and creates another geometry node that is added to the index and has a reference to the domain node, Sony outstanding can find that via the index lookup. This is different from just adding to the layer which does not do the second step. This is kind of annoying so we are planning to makes this a bit more consistent :-) /peterConga
Hey Peter - can you help me understand the need for Key and Value when adding the node to the spacial index? Say I have a node with RaceID, RaceName, City, etc on it. and RaceID, RaceName, and Date are all indexed using Lucene full text. RaceID is the primary key of the object (we use SQL Server as our write repo). Would it make sense to set the Key to RaceID and Value to RaceID.ToString() when adding to the spatial index? Would this give me any benefits? or is it truly a dummy key/val pair that is otherwise never used? TroyConga
In this form of index, there is no need for key/value, they are not used. In that respect, the spatial index is working much like the new indexes on labels - gathering its information purely from the configuration. /peterConga
Here is how I understand it then: 1) Created Spatial layer, defining the layer name, and the keys I will use for Latitude and Longitude: {"layer":"Places","lat":"lat", "lon","lon"} 2) I create my Spacial Index, and reference the name of my layer as name, and add config values for provider, geometry_type, lat, and lon: {"name":"Places", "config":{"provider":"spatial", "geometry_type":"point", "lat":"lat", "lon":"lon"}} 3) Create node & ensure it has properties for lat and lon. {"RaceName":"My Really Big Marathon", "RaceID":1234, "lat":60.1,"lon",-112.2} 4) Add new node to index from #2.Conga
1) In step 1 of defining the layer, I am assuming here that I am telling Neo what I am going to be calling my Lat and Lon fields, correct? Meaning, had I provided values in step 1 (layer creation) for lat and lon of, say, "cat", and "mouse", would that have worked throughout the solution? {"layer":"Places", "lat": "Latitude", "lon": "Longitude" } OR {"layer":"Places", "lat": "Cat", "lon": "Mouse" }Conga
yes, as far as I remember you can configure the node property names for lon/lat both in the layer setup and in the index setup, and should use the same, since they are configuration that ultimately ends up configuring an adapter class, in this case. The layer reads these properties for the configuration of the SimplePointEncoder (you can build more advanced encoders/decoders, too, e.g. that project polygons from cypher queries into the GIS world, and that could probably have a different configuration, e.g. Cypher Query Parameters). Hope that makes sense? /peter neubauerConga
I was also able to do the "join"-ish query in cypher this way: start g = node:Places('withinDistance:[38.0,-122.0,300.0]'), n = node:Events('sresultcount:[000001 TO 000100]') where g = n return g.RaceName; This returns nodes withinDistance as well as nodes with a resultcount between 1 and 100 and then sort of joins them in the where (pulling only those from g and n that are the same node). Can you see a better way to do this? It works perfectly, but seems clunky as it would require that I return two potentially huge data sets to compare. Thanks. TroyConga
this is how it is working at the moment - joining the two index lookups in the JVM memory. However, with more advanced logic, the Spatial Index is built as an RTree in the graph, one could devise a traversal that checks e.g. the sresultcount on a domain node while traversing the RTree with spatial bounding boxes. We don't have an example of that in the codebase right now but Craig has been experimenting with that. Another way would be to extend the 2-dimensional RTree to n-dimensional scalar property indexes, thus incorporating e.g. the sresultcount into the bounding box search. /peterConga

© 2022 - 2024 — McMap. All rights reserved.