I got a dataframe that contains places with their latitude and longitude. Imagine for example cities.
df = pd.DataFrame([{'city':"Berlin", 'lat':52.5243700, 'lng':13.4105300},
{'city':"Potsdam", 'lat':52.3988600, 'lng':13.0656600},
{'city':"Hamburg", 'lat':53.5753200, 'lng':10.0153400}]);
Now I'm trying to get all cities in a radius around another. Let's say all cities in a distance of 500km from Berlin, 500km from Hamburg and so on. I would do this by duplicating the original dataframe and joining both with a distance-function.
The intermediate result would be somewhat like this:
Berlin --> Potsdam
Berlin --> Hamburg
Potsdam --> Berlin
Potsdam --> Hamburg
Hamburg --> Potsdam
Hamburg --> Berlin
This final result after grouping (reducing) should be like this. Remark: Would be cool if the list of values includes all columns of the city.
Berlin --> [Potsdam, Hamburg]
Potsdam --> [Berlin, Hamburg]
Hamburg --> [Berlin, Potsdam]
Or just the count of cities 500km around one city.
Berlin --> 2
Potsdam --> 2
Hamburg --> 2
Since I'm quite new to Python, I would appreciate any starting point. I'm familiar with haversine distance. But not sure if there are useful distance/spatial methods in Scipy or Pandas.
Glad if you can give me a starting point. Up to now I tried following this post.
Update: The original idea behind this question comes from the Two Sigma Connect Rental Listing Kaggle Competition. The idea is to get those listing 100m around another listing. Which a) indicates a density and therefore a popular area and b) if the addresses are compares, you can find out if there is a crossing and therefore a noisy area. Therefore you not need the full item to item relation since you need to compare not only the distance but also the address and other meta-data. PS: I'm not uploading a solution to Kaggle. I just want to learn.
scipy
orpandas
. Define a function in the code. This one seems like it should work: #4913849 – Wassail