Calculate geographical distance between 5 cities with all the possible combinations of each city
Asked Answered
C

2

0

So I have a csv file which consists of 3 columns (City,Latitude,Longitude) and I have created a data frame in python from this csv file using this code

data = pd.read_csv("lat_long.csv",nrows=10)
Lat = data.lat.tolist()
Lon = data.lon.tolist()
suburb = data.suburb.tolist()
dict={'Latitude':Lat,'Longitude':Lon}
df = pd.DataFrame(dict,index=(suburb))

And the output is this

                                 Latitude   Longitude
AUSTRALIAN NATIONAL UNIVERSITY -35.277272  149.117136
BARTON                         -35.201372  149.095065
DARWIN                         -12.801028  130.955789
DARWIN                         -12.801028  130.955789
PARAP                          -12.432181  130.843310
ALAWA                          -12.378451  130.877014
BRINKIN                        -12.367769  130.869808
CASUARINA                      -12.376597  130.850489
JINGILI                        -12.385761  130.873726
LEE POINT                      -12.360865  130.891349

Now what I want is all possible combination of distance from 1 city to other 9 cities. It should look like

                                              DISTANCE
AUSTRALIAN NATIONAL UNIVERSITY- BARTON
AUSTRALIAN NATIONAL UNIVERSITY - DARWIN
AUSTRALIAN NATIONAL UNIVERSITY - DARWIN
AUSTRALIAN NATIONAL UNIVERSITY - PARAP

I have tried doing this using nested for loops and it works but I want a bit faster.

Colombo answered 31/7, 2019 at 5:50 Comment(0)
C
1

I start with the dataframe

    city         Latitude   Longitude
0   AUSTRAL.    -35.277272  149.117136
1   BARTON      -35.201372  149.095065
2   DARWIN      -12.801028  130.955789
3   DARWIN      -12.801028  130.955789
4   PARAP       -12.432181  130.843310
5   ALAWA       -12.378451  130.877014
6   BRINKIN     -12.367769  130.869808
7   CASUARINA   -12.376597  130.850489
8   JINGILI     -12.385761  130.873726
9   LEE_POINT   -12.360865  130.891349

And create new column which is only a helper to create the cartesian product that we get by merging the dataframe with itself.

df['join'] = 1
df_joined = pd.merge(df, df,on='join')

df_joined['haversine_dist'] = df_joined.apply(lambda x: haversine((x.Latitude_x, x.Longitude_x),(x.Latitude_y,x.Longitude_y)), 1)

result (just first 5 columns)

    city_x      Latitude_x  Longitude_x join city_y Latitude_y  Longitude_y haversine_dist
0   AUSTRAL.    -35.277272  149.117136  1   AUSTRAL.    -35.277272  149.117136  0.000000
1   AUSTRAL.    -35.277272  149.117136  1   BARTON  -35.201372  149.095065  8.674473
2   AUSTRAL.    -35.277272  149.117136  1   DARWIN  -12.801028  130.955789  3093.972598
3   AUSTRAL.    -35.277272  149.117136  1   DARWIN  -12.801028  130.955789  3093.972598
4   AUSTRAL.    -35.277272  149.117136  1   PARAP   -12.432181  130.843310  3135.034018
5   AUSTRAL.    -35.277272  149.117136  1   ALAWA   -12.378451  130.877014  3138.077950
Calvincalvina answered 31/7, 2019 at 6:0 Comment(1)
this should be solution. I don't have a chance to test it right now, but this should work. Just ping me if it doesn't and I'm gonna check it this evening.Calvincalvina
G
0

In order to test, I construct the original DataFrame by hand

import pandas as pd 
import itertools
from haversine import haversine
x = {'city':['AUSTRALIAN NATIONAL UNIVERSITY', 'BARTON', 'DARWIN', 'DARWIN', 'PARAP', 'ALAWA', 'BRINKIN', 'CASUARINA', 'JINGILI', 'LEE_POINT' ]}
la = {'Latitude':[-35.277272,-35.201372, -12.801028 , -12.801028, -12.432181, -12.378451, -12.367769, -12.376597, -12.385761, -12.360865]}
lo = {'Longitude':[149.117136,149.095065, 130.955789 , 130.955789, 130.843310,  130.877014, 130.869808, 130.850489, 130.873726, 130.891349]}
data = {**x, **la, **lo}
df = pd.DataFrame(data)

Drop the duplication.

df = df.drop_duplicates()

List out all the cities.

city = list(df["city"])

Combine two of them

TwoCity = list(itertools.combinations(city, 2))

Construct the new DataFrame

df1 = pd.DataFrame({'TwoCity':TwoCity})
df1['Distance(km)'] = df1.apply(lambda row: \
          haversine((df[df['city']==row.TwoCity[0]]['Latitude'], df[df['city']==row.TwoCity[0]]['Longitude']),\
                    (df[df['city']==row.TwoCity[1]]['Latitude'], df[df['city']==row.TwoCity[1]]['Longitude'])),axis=1)
print(df1.to_string(index=False))

The final result of df1 is (with a little adjustment by hand):

   TwoCity                                     Distance(km)
   (AUSTRALIAN NATIONAL UNIVERSITY, BARTON)      8.674473
   (AUSTRALIAN NATIONAL UNIVERSITY, DARWIN)   3093.972598
    (AUSTRALIAN NATIONAL UNIVERSITY, PARAP)   3135.034018
    (AUSTRALIAN NATIONAL UNIVERSITY, ALAWA)   3138.077950
  (AUSTRALIAN NATIONAL UNIVERSITY, BRINKIN)   3139.500311
(AUSTRALIAN NATIONAL UNIVERSITY, CASUARINA)   3139.808790
  (AUSTRALIAN NATIONAL UNIVERSITY, JINGILI)   3137.587038
(AUSTRALIAN NATIONAL UNIVERSITY, LEE_POINT)   3138.882795
                           (BARTON, DARWIN)   3086.264122
                            (BARTON, PARAP)   3127.309536
                            (BARTON, ALAWA)   3130.345201
                          (BARTON, BRINKIN)   3131.767583
                        (BARTON, CASUARINA)   3132.079061
                          (BARTON, JINGILI)   3129.855257
                        (BARTON, LEE_POINT)   3131.146957
                            (DARWIN, PARAP)     42.791471
                            (DARWIN, ALAWA)     47.759804
                          (DARWIN, BRINKIN)     49.071577
                        (DARWIN, CASUARINA)     48.558395
                          (DARWIN, JINGILI)     47.026561
                        (DARWIN, LEE_POINT)     49.441057
                             (PARAP, ALAWA)      7.006568
                           (PARAP, BRINKIN)      7.718791
                         (PARAP, CASUARINA)      6.229645
                           (PARAP, JINGILI)      6.128079
                         (PARAP, LEE_POINT)      9.492285
                           (ALAWA, BRINKIN)      1.422460
                         (ALAWA, CASUARINA)      2.888261
                           (ALAWA, JINGILI)      0.887821
                         (ALAWA, LEE_POINT)      2.499614
                       (BRINKIN, CASUARINA)      2.316553
                         (BRINKIN, JINGILI)      2.045378
                       (BRINKIN, LEE_POINT)      2.462424
                       (CASUARINA, JINGILI)      2.721699
                     (CASUARINA, LEE_POINT)      4.770298
                       (JINGILI, LEE_POINT)      3.365596
Gregoriogregorius answered 31/7, 2019 at 9:32 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.