PostGis nearest neighbours query
Asked Answered
F

3

8

I would like to retrieve all points within a given range of another set of points. Let's say, find all shops within 500m of any subway station.

I wrote this query, which is quite slow, and would like to optimize it:

SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500, false);

I'm running on latest versions of Postgres and PostGis (Postgres 9.5, PostGis 2.2.1)

Here is the table metadata:

                                         Table "public.locations"
       Column       |            Type             |                       Modifiers
--------------------+-----------------------------+--------------------------------------------------------
 id                 | integer                     | not null default nextval('locations_id_seq'::regclass)
 coordinates        | geometry                    |
Indexes:
    "locations_coordinates_index" gist (coordinates)


                                      Table "public.pois"
   Column    |            Type             |                     Modifiers
-------------+-----------------------------+---------------------------------------------------
 id          | integer                     | not null default nextval('pois_id_seq'::regclass)
 coordinates | geometry                    |
 poi_kind_id | integer                     |
Indexes:
    "pois_pkey" PRIMARY KEY, btree (id)
    "pois_coordinates_index" gist (coordinates)
    "pois_poi_kind_id_index" btree (poi_kind_id)
Foreign-key constraints:
    "pois_poi_kind_id_fkey" FOREIGN KEY (poi_kind_id) REFERENCES poi_kinds(id)

Here is the result of EXPLAIN (ANALYZE, BUFFERS):

Unique  (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.080..3338.252 rows=918 loops=1)
Buffers: shared hit=559
->  Sort  (cost=2407390.71..2407390.72 rows=2 width=4) (actual time=3338.079..3338.145 rows=963 loops=1)
      Sort Key: locations.id
      Sort Method: quicksort  Memory: 70kB
      Buffers: shared hit=559
      ->  Nested Loop  (cost=0.00..2407390.71 rows=2 width=4) (actual time=2.466..3337.835 rows=963 loops=1)
            Join Filter: (((pois.coordinates)::geography && _st_expand((locations.coordinates)::geography, 500::double precision)) AND ((locations.coordinates)::geography && _st_expand((pois.coordinates)::geography, 500::double precision)) AND _st_dwithin((pois.coordinates)::geography, (locations.coordinates)::geography, 500::double precision, false))
            Rows Removed by Join Filter: 4531356
            Buffers: shared hit=559
            ->  Seq Scan on locations  (cost=0.00..791.68 rows=24168 width=36) (actual time=0.005..3.100 rows=24237 loops=1)
                  Buffers: shared hit=550
            ->  Materialize  (cost=0.00..10.47 rows=187 width=32) (actual time=0.000..0.009 rows=187 loops=24237)
                  Buffers: shared hit=6
                  ->  Seq Scan on pois  (cost=0.00..9.54 rows=187 width=32) (actual time=0.015..0.053 rows=187 loops=1)
                        Filter: (poi_kind_id = 3)
                        Rows Removed by Filter: 96
                        Buffers: shared hit=6
Planning time: 0.184 ms
Execution time: 3338.304 ms
(20 rows)
Ferryman answered 14/1, 2016 at 9:52 Comment(10)
Are they geometry or geography?Compossible
wiki.postgresql.org/wiki/Slow_Query_QuestionsImmotile
@FrancescoD'Alesio geometryFerryman
Are you using a metric coordinate system? Is the result slow but correct?Compossible
@FrancescoD'Alesio yes it's a metric system. Yes current result is correct but too slow (about 3 sec to match 100.000 shops with 200 metro stations)Ferryman
I deleted my answer, since it was wrong. ST_DWithin() should be able to use the spatial indexes in any case. Please provide the output of EXPLAIN (ANALYZE, BUFFERS). Also your actual version numbers. "latest" version is subject to bit rot. I suspect that you just get lots of results. 200 subway stations, 500 meter radius ... might return a lot.Pilose
It shouldn't take so much time..Compossible
I'd try to write x and y coordinates in a double precision col, then try to get results using a bbox query like x > minx and x < maxx ...Compossible
After the EXPLAIN ANALYZE of course! :)Compossible
Ok guys, just added the EXPLAIN ANALYZE ! :)Ferryman
F
0

I eventually came to the conclusion I could not live-compute distance between thousands of point of interest and thousands of locations within a realistic amount of time (< 1sec).

So instead I precompute everything: each time a location or a POI is created/updated, I store the minimum distance between each location and each kind of POI in order to be able to answer the question "which locations are closer than X meters from this kind of POI".

Here is the module I coded for this purpose (it's in Elixir, but the main part is raw SQL)

defmodule My.POILocationDistanceService do

  alias Ecto.Adapters.SQL
  alias My.Repo

  def delete_distance_for_location(location_id) do
    run_query!("DELETE FROM poi_location_distance WHERE location_id = $1::integer", [location_id])
  end

  def delete_distance_for_poi_kind(poi_kind_id) do
    run_query!("DELETE FROM poi_location_distance WHERE poi_kind_id = $1::integer", [poi_kind_id])
  end

  def insert_distance_for_location(location_id) do
    sql = """
    INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
    SELECT
      DISTINCT ON (p.poi_kind_id)
      p.poi_kind_id as poi_kind_id,
      l.id as location_id,
      p.id as poi_id,
      MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
    FROM locations l, pois p
    WHERE
      l.id = $1
      AND ST_DWithin(l.coordinates, p.coordinates, $2, FALSE)
    GROUP BY p.poi_kind_id, p.id, l.id
    ORDER BY p.poi_kind_id, distance;
    """

    run_query!(sql, [location_id, max_distance])
  end

  def insert_distance_for_poi_kind(poi_kind_id, offset \\ 0, limit \\ 10_000_000) do
    sql = """
    INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
    SELECT
      DISTINCT ON(l.id, p.poi_kind_id)
      p.poi_kind_id as poi_kind_id,
      l.id as location_id,
      p.id as poi_id,
      MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
    FROM pois p, (SELECT * FROM locations OFFSET $1 LIMIT $2) as l
    WHERE
      p.poi_kind_id = $3
      AND ST_DWithin(l.coordinates, p.coordinates, $4, FALSE)
    GROUP BY l.id, p.poi_kind_id, p.id;
    """

    run_query!(sql, [offset, limit, poi_kind_id, max_distance])
  end

  defp run_query!(query, params) do
    SQL.query!(Repo, query, params)
  end

  def max_distance, do: 5000

end
Ferryman answered 1/2, 2016 at 10:10 Comment(0)
C
0

I think that you are using the geography version of st_dwithin, because of the fourth parameter.

Try changing your query to this one:

SELECT DISCTINCT ON(locations.id) locations.id FROM locations, pois
WHERE pois.poi_kind = 'subway'
AND ST_DWithin(locations.coordinates, pois.coordinates, 500);

If it doesn't solve, please post the explain analyze again.

Compossible answered 15/1, 2016 at 10:19 Comment(1)
Removing the FALSE parameter made the query run 8.6sec (instead of 3sec). Here is the new explain analyze: gist.github.com/cblavier/726139eda4cd574340bdFerryman
F
0

I eventually came to the conclusion I could not live-compute distance between thousands of point of interest and thousands of locations within a realistic amount of time (< 1sec).

So instead I precompute everything: each time a location or a POI is created/updated, I store the minimum distance between each location and each kind of POI in order to be able to answer the question "which locations are closer than X meters from this kind of POI".

Here is the module I coded for this purpose (it's in Elixir, but the main part is raw SQL)

defmodule My.POILocationDistanceService do

  alias Ecto.Adapters.SQL
  alias My.Repo

  def delete_distance_for_location(location_id) do
    run_query!("DELETE FROM poi_location_distance WHERE location_id = $1::integer", [location_id])
  end

  def delete_distance_for_poi_kind(poi_kind_id) do
    run_query!("DELETE FROM poi_location_distance WHERE poi_kind_id = $1::integer", [poi_kind_id])
  end

  def insert_distance_for_location(location_id) do
    sql = """
    INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
    SELECT
      DISTINCT ON (p.poi_kind_id)
      p.poi_kind_id as poi_kind_id,
      l.id as location_id,
      p.id as poi_id,
      MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
    FROM locations l, pois p
    WHERE
      l.id = $1
      AND ST_DWithin(l.coordinates, p.coordinates, $2, FALSE)
    GROUP BY p.poi_kind_id, p.id, l.id
    ORDER BY p.poi_kind_id, distance;
    """

    run_query!(sql, [location_id, max_distance])
  end

  def insert_distance_for_poi_kind(poi_kind_id, offset \\ 0, limit \\ 10_000_000) do
    sql = """
    INSERT INTO poi_location_distance(poi_kind_id, location_id, poi_id, distance)
    SELECT
      DISTINCT ON(l.id, p.poi_kind_id)
      p.poi_kind_id as poi_kind_id,
      l.id as location_id,
      p.id as poi_id,
      MIN(ST_Distance_Sphere(l.coordinates, p.coordinates)) as distance
    FROM pois p, (SELECT * FROM locations OFFSET $1 LIMIT $2) as l
    WHERE
      p.poi_kind_id = $3
      AND ST_DWithin(l.coordinates, p.coordinates, $4, FALSE)
    GROUP BY l.id, p.poi_kind_id, p.id;
    """

    run_query!(sql, [offset, limit, poi_kind_id, max_distance])
  end

  defp run_query!(query, params) do
    SQL.query!(Repo, query, params)
  end

  def max_distance, do: 5000

end
Ferryman answered 1/2, 2016 at 10:10 Comment(0)
C
-1

I think you should change a solution, postgis is still running a query in Structured Database, it is powerful, but not quick in special requirement,may be you need elasticsearch.

elasticsearch is good at Geo location search ,but not good at Geo data process,I think you need them both.

https://www.elastic.co/blog/geo-location-and-search

Cuckoopint answered 14/1, 2016 at 17:17 Comment(2)
Thanks for your answer, but I'd really like to stick to postgres. If there is no way to cover my need, I will consider using an additional storage engineFerryman
postgis can be very fast if tuned accordingly. It is a specialized GIS db extension of pgHay

© 2022 - 2024 — McMap. All rights reserved.