Distance betweeen coordinates Python vs R computation time
Asked Answered
S

1

0

I am trying to calculate the distance between one point and many others on a WGS84 ellipsoid - not the haversine approximation as explained in other answers. I would like to do it in Python but the computation time is very long with respect to R. My Python script below takes almost 23 seconds while the equivalent one in R takes 0.13 seconds. Any suggestion for speeding up my python code?

Python script:

import numpy as np
import pandas as pd
import xarray as xr
from geopy.distance import geodesic
from timeit import default_timer as timer
df = pd.DataFrame()
city_coord_orig = (4.351749, 50.845701) 
city_coord_orig_r = tuple(reversed(city_coord_orig))
N = 100000
np.random.normal()
df['or'] = [city_coord_orig_r] * N
df['new'] = df.apply(lambda x: (x['or'][0] + np.random.normal(), x['or'][1] + np.random.normal()), axis=1)
start = timer()
df['d2city2'] = df.apply(lambda x: geodesic(x['or'], x['new']).km, axis=1)
end = timer()
print(end - start)

R script

# clean up
rm(list = ls())
# read libraries
library(geosphere)

city.coord.orig <- c(4.351749, 50.845701)
N<-100000
many <- data.frame(x=rep(city.coord.orig[1], N) + rnorm(N), 
                   y=rep(city.coord.orig[2], N) + rnorm(N))
city.coord.orig <- c(4.351749, 50.845701)
start_time <- Sys.time()
many$d2city <- distGeo(city.coord.orig, many[,c("x","y")]) 
end_time <- Sys.time()
end_time - start_time
Stirring answered 1/3, 2019 at 14:27 Comment(4)
@Marijn Pieters Thank you for the suggestions. I had looked at geopandas but I could not find a solution. I do not want to switch to UTM coordinates as the actual area I am looking at is large and ideally I would like the same output as the R function (which like geodesic in python also considers the WGS84). I had looked also at the scipy library but I could not find WGS84 mentioned anywhere.Stirring
It doesn't support WGS84 distances directly, no, but can be given a custom distance function.Irmairme
I see but that does not really help, I really need WGS84. I am not an experienced programmer, someone out here must have had the same issue.Stirring
Yeah, this isn't covered too well, reopened. I think pyproj is the better choice here anyway.Irmairme
I
4

You are using .apply(), which uses a simple loop to run your function for each and every row. The distance calculation is done entirely in Python (geopy uses geographiclib which appears to be written in Python only). Non-vectorised distance calculations are slow, what you need is a vectorised solution using compiled code, just like when calculating the Haversine distance.

pyproj offers verctorised WSG84 distance calculations (the methods of the pyproj.Geod class accept numpy arrays) and wraps the PROJ4 library, meaning it runs these calculations in native machine code:

from pyproj import Geod

# split out coordinates into separate columns
df[['or_lat', 'or_lon']] = pd.DataFrame(df['or'].tolist(), index=df.index)
df[['new_lat', 'new_lon']] = pd.DataFrame(df['new'].tolist(), index=df.index)

wsg84 = Geod(ellps='WGS84')
# numpy matrix of the lon / lat columns, iterable in column order
or_and_new = df[['or_lon', 'or_lat', 'new_lon', 'new_lat']].to_numpy().T
df['d2city2'] = wsg84.inv(*or_and_new)[-1] / 1000  # as km

This clocks in at considerably better times:

>>> from timeit import Timer
>>> count, total = Timer(
...     "wsg84.inv(*df[['or_lon', 'or_lat', 'new_lon', 'new_lat']].to_numpy().T)[-1] / 1000",
...     'from __main__ import wsg84, df'
... ).autorange()
>>> total / count * 10 ** 3  # milliseconds
66.09873340003105

66 milliseconds to calculate 100k distances, not bad!

To make the comparison objective, here is your geopy / df.apply() version on the same machine:

>>> count, total = Timer("df.apply(lambda x: geodesic(x['or'], x['new']).km, axis=1)", 'from __main__ import geodesic, df').autorange()
>>> total / count * 10 ** 3  # milliseconds
25844.119450000107

25.8 seconds, not even in the same ballpark.

Irmairme answered 1/3, 2019 at 16:15 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.