Reverse geocoding with big array is fastest way? - javascript and performance
Asked Answered
A

4

2

I have many points on Google Maps and I want to show for each point the nearest city (so a reverse geocoding). I have a multidimensional array like this:

citta_vicine = [];

var comuni = [
 ["Abano Terme (PD)",45.3594,11.7894],
 ["Abbadia Cerreto (LO)",45.3122,9.5928],
 ["Abbadia Lariana (LC)",45.8992,9.3336],
 ["Abbadia San Salvatore (SI)",42.8800,11.6775],
 ["Abbasanta (OR)",40.1250,8.8200]
]

//city, latitude, longitude

The problem is that my array has all city of Italy (8000 !) and is 300 Kb.

To get nearest city i can use this:

//this line will be inside for loop of points
 var city_near= estrapola_paese(50,lat,lng); //lat and lng are coordinates of these points
//


 function estrapola_paese(distanza,latB,longB){ 
  citta_vicine = [];
  for(var i= 0; i < comuni.length; i++){
    var dist_eqcity= dist_coords(comuni[i][1],comuni[i][2],latB,longB);
    if(dist_eqcity < distanza){
        citta_vicine.push([dist_eqcity, comuni[i][0]]);
    }
  }
  if(citta_vicine.length > 0){
    citta_vicine.sort(function(a, b) {
        return a - b;
    }); 
    return citta_vicine[0][1];
  }
  else{
    distanza = distanza+50;
    estrapola_paese(distanza,latB,longB);   
  }
 }

//calculate distance in Km between city and "point"
function dist_coords(latA,longA,latB,longB) {
 var R = 6372.795477598;
 var laA = latA * Math.PI/180;
 var laB = latB * Math.PI/180;
 var loA = longA * Math.PI/180;
 var loB = longB * Math.PI/180;
 var distanza = R * Math.acos(Math.sin(laA)*Math.sin(laB) + Math.cos(laA)*Math.cos(laB) * Math.cos(loA-loB));
 if(isNaN(distanza) == true){   
  distanza = 0;
 }
 return distanza;
} 

In short, for a question of performance, I consider (at first) only cities within a radius of 50 km from the point. If there are cities within 50 km, I add the city (and the distance) in the "citta_vicine" array and order the latter array from the lowest to the highest value. Therefore from the city closest to the most distant.

If instead there are no cities within 50 km then I perform the function "estrapola_paese" again but increase the radius to be considered of another 50 km.


I think CODE WORKS but I have many doubts:

1) The file weighs 459 KB: is it too much?

2) Is there a better way to do all this?

3) sorting of array citta_vicine is correct? If is not empty is like this:

   [
    ["tokyo",34],
    ["rome",24],
    ["paris",54]
   ]

using this:

   citta_vicine.sort(function(a, b) {
     return a - b;
   }); 

i will have this output:

   [        
    ["rome",24],
    ["tokyo",34],
    ["paris",54]
   ]

I hope you can help me and sorry for my English.

Altruist answered 26/3, 2018 at 15:2 Comment(4)
how ? points in my case are coordinates of earthquakes and every moment can occured a quake in the worldAltruist
You need some kind of spatial index. This is a K-D tree implementation in Javascript. There are other implementations, as well as other data structures... Also, perhaps you don't need Harvesine distance, but just (squared) Pythagorean one. Could you please paste somewhere your list of cities for experiments?Oberstone
@StanislavKralin i paste a short example of list.. the file is really big (is 400 kb). Why do you think i need just Pythagorean method and not harvesine distance ? can you show me some code? Thanks a lotAltruist
in future, when i will understand that code XD, i will can use it to get near city for other countries, for example even for usa or china that is country very large. So in this case is better to use harversine no?Altruist
D
1

Step #1 Calculate distance of all locations.

Step #2 Sort the result by the distance value

Step #3 Find locations, repeat it at least one record is found.

Check the DEMO, to calculate a list with 7320 records took ~ 17.22119140625ms

const citites = [
  [`Abano Terme (PD)`, 45.3594, 11.7894],
  [`Abbadia Cerreto (LO)`, 45.3122, 9.5928],
  [`Abbadia Lariana (LC)`, 45.8992, 9.3336],
  [`Abbadia San Salvatore (SI)`, 42.8800, 11.6775],
  [`Abbasanta (OR)`, 40.1250, 8.8200]
]

function distance(lat, long) {
  const R = 6372.795477598
  const PI = Math.PI / 180
  return cities
    .map(city => {
      const laA = city[1] * PI
      const laB = lat * PI
      const loA = city[2] * PI
      const loB = long * PI
      const dist = R * Math.acos(
        Math.sin(laA) * Math.sin(laB) +
        Math.cos(laA) * Math.cos(laB) * Math.cos(loA - loB)
      ) || 0
      return { dist, city }
    })
    .sort((a, b) => a.dist - b.dist)
}


function nearest(dist, lat, long) {
  const locations = distance(lat, long)
  function find(delta) {
    const result = []
    for (let location of locations) {
      if (location.dist > delta) break
      result.push(location.city)
    }
    return result.length > 0
      ? result
      : find(delta + 50)
  }
  return find(dist)
}

const result = nearest(50, 41.89595563, 12.48325842)
Diplostemonous answered 1/4, 2018 at 13:13 Comment(9)
@Altruist can you share with the list of cities (8000)? I want to make some benchmark of calculations.Diplostemonous
could you modify the code not using ternary operators? thanks a lot :)Altruist
@Altruist The ternary operator makes the code less and clean. You can replace it if (result.length) return result else return find(delta + toLat(50)). I have updated the demo code with your full city list.Diplostemonous
it works well, even without node or other modules right? you use "const" and "let"...like ECMAscript 6 ? is possible to see in your example also time performance ?Altruist
Yes, it works I have provided the demo of it. If you check console.log you will see the processed time snag.gy/1vO2JQ.jpgDiplostemonous
i have only doubt.. is possible to use your code but using "my" function to calculate distance ?Altruist
@Altruist Updated with your calculationsDiplostemonous
how i can show distance of first city ? i know that this isn't the first question....Altruist
@Altruist replace result.push(location.city) to result.push(location)en in result you will get dist info too.Diplostemonous
P
4

Since the city data is not dynamically changed and you need to calculate the distance / nearest neighbour frequently, using a geospatial index (KD-Tree, R-Tree etc) would make sense.

Here's an example implementation using geokdbush which is based on a static spatial index implemented using a KD-Tree. It takes Earth curvature and date line wrapping into account.

const kdbush = require('kdbush');
const geokdbush = require('geokdbush');

// I've stored the data points as objects to make the values unambiguous
const cities = [
  { name: "Abano Terme (PD)", latitude: 45.3594, longitude: 11.7894 },
  { name: "Abbadia Cerreto (LO)", latitude: 45.3122, longitude: 9.5928 },
  { name: "Abbadia Lariana (LC)", latitude: 45.8992, longitude: 9.3336 },
  { name: "Abbadia San Salvatore (SI)", latitude: 42.8800, longitude: 11.6775 },
  { name: "Abbasanta (OR)", latitude: 40.1250, longitude: 8.8200 }
];

// Create the index over city data ONCE
const index = kdbush(cities, ({ longitude }) => longitude, ({ latitude }) => latitude);

// Get the nearest neighbour in a radius of 50km for a point with latitude 43.7051 and longitude 11.4363
const nearest = geokdbush.around(index, 11.4363, 43.7051, 1, 50);

Once again, bear in mind that kdbush is a static index and cannot be changed (you cannot add or remove cities from it). If you need to change the city data after initialisation, depending on how often you do it, using an index might prove too costly.

Pskov answered 28/3, 2018 at 18:56 Comment(10)
i don't underatand.... to use this code i need to have database on server ? can you explain better your example ? thanks a lotAltruist
You don't need a database, the index is created in memory. You can run this example in node as is or you can use gulp / browserify or webpack or whatever to pack it for front-end use. Both kdbush and geokdbush npm packages are lightweight with only one transitive dependency so a compiled script should not be too bigCavorelievo
Also, depending on the level of browser support you want, you should probably babelify it as well.Cavorelievo
the priority is to have a faster code than my example pasted in top question (that works perfectly).... do you think is more faster k-d tree ?Altruist
i have only one problem... i'm not able to us gitHub :( (for me is already difficult here because i'm italian and don't speak english very well)... to use your code where i can download "kdbush" and "geokdbush" packages ?Altruist
It almost certainly will be faster as your approach scans the whole dataset every time (sometimes multiple times). Take a look at the benchmarks on geokdbush and you'll see that it's significantly faster than a naive approach but I cannot verify that for your case as I don't have your full dataset. You can run some benchmarks yourself.Cavorelievo
i'm tring your example but require() doesn't work. i tried to use requireJS but i don't know why doesn't work... can you say me how can i add these libraries ?Altruist
This is out of scope of the original question. Both kdbush and geokdbush are npm modules which can also be used in the browser as they have no native dependencies. Just google npm modules in browser browserify. There are plenty of resources to get you started.Cavorelievo
only one last doubt: to use nmp module i have to install node.js right ? is possible to use thise modules in a local version on my pc ? ThanksAltruist
i don't know if your example works... i tried to install node.js but not works...i hope can i use a k-d tree without modulesAltruist
O
1

You may want to sort after the second array element:

 citta_vicine.sort(function(a, b) {
   return a[1] - b[1];
 }); 
Outdare answered 26/3, 2018 at 15:5 Comment(0)
M
1

To get nearest city...

If you're only interested in the nearest city, you don't need to sort the whole list. That's your first performance gain in one line of code!

// Unneeded sort:
const closest = cityDistancePairs.sort((a, b) => a[1] - b[2])[0];

// Only iterates once instead:
const closestAlt = cityDistancePairs.reduce(
  (closest, current) => current[1] < closest[1] ? current : closest
);

To further optimize you'll need to benchmark which parts of the code take the longest to run. Some ideas:

  • Do a quick check on the lat/lon difference before calculating the precise value. If coordinates are further than a certain delta apart, you already know they are out of range.
  • Cache the calculated distances by implementing a memoization pattern to make sure that on a second pass with a different limit (50 -> 100), you don't recalculate the distances

However, I can't imagine that a loop of 8000 distance calculations is the real performance drain... I'm guessing that the parsing of 300kb of javascript is the real bottleneck. How are you initializing the city array?

Make sure you strip down the data set to only the properties and values you actually use. If you know you're only going to use the name and lat/lon, you can preprocess the data to only include those. That'll probably make it much smaller than 300kb and easier to work with.

Marathon answered 30/3, 2018 at 14:59 Comment(1)
i don't understand your code :/ it seems is a part of a bigger code...Anyway i can't calculate delta because i could use this code even to get nearest city not only for Italy but for all places of the world and for example, in a desert, the nearest city from "point" can be very far, also more our "delta". How can i initializing the city array ? can you show me an example ? thanks a lot!Altruist
D
1

Step #1 Calculate distance of all locations.

Step #2 Sort the result by the distance value

Step #3 Find locations, repeat it at least one record is found.

Check the DEMO, to calculate a list with 7320 records took ~ 17.22119140625ms

const citites = [
  [`Abano Terme (PD)`, 45.3594, 11.7894],
  [`Abbadia Cerreto (LO)`, 45.3122, 9.5928],
  [`Abbadia Lariana (LC)`, 45.8992, 9.3336],
  [`Abbadia San Salvatore (SI)`, 42.8800, 11.6775],
  [`Abbasanta (OR)`, 40.1250, 8.8200]
]

function distance(lat, long) {
  const R = 6372.795477598
  const PI = Math.PI / 180
  return cities
    .map(city => {
      const laA = city[1] * PI
      const laB = lat * PI
      const loA = city[2] * PI
      const loB = long * PI
      const dist = R * Math.acos(
        Math.sin(laA) * Math.sin(laB) +
        Math.cos(laA) * Math.cos(laB) * Math.cos(loA - loB)
      ) || 0
      return { dist, city }
    })
    .sort((a, b) => a.dist - b.dist)
}


function nearest(dist, lat, long) {
  const locations = distance(lat, long)
  function find(delta) {
    const result = []
    for (let location of locations) {
      if (location.dist > delta) break
      result.push(location.city)
    }
    return result.length > 0
      ? result
      : find(delta + 50)
  }
  return find(dist)
}

const result = nearest(50, 41.89595563, 12.48325842)
Diplostemonous answered 1/4, 2018 at 13:13 Comment(9)
@Altruist can you share with the list of cities (8000)? I want to make some benchmark of calculations.Diplostemonous
could you modify the code not using ternary operators? thanks a lot :)Altruist
@Altruist The ternary operator makes the code less and clean. You can replace it if (result.length) return result else return find(delta + toLat(50)). I have updated the demo code with your full city list.Diplostemonous
it works well, even without node or other modules right? you use "const" and "let"...like ECMAscript 6 ? is possible to see in your example also time performance ?Altruist
Yes, it works I have provided the demo of it. If you check console.log you will see the processed time snag.gy/1vO2JQ.jpgDiplostemonous
i have only doubt.. is possible to use your code but using "my" function to calculate distance ?Altruist
@Altruist Updated with your calculationsDiplostemonous
how i can show distance of first city ? i know that this isn't the first question....Altruist
@Altruist replace result.push(location.city) to result.push(location)en in result you will get dist info too.Diplostemonous

© 2022 - 2024 — McMap. All rights reserved.