I am trying to calculate the distance between one point and many others on a WGS84 ellipsoid - not the haversine approximation as explained in other answers. I would like to do it in Python but the computation time is very long with respect to R. My Python script below takes almost 23 seconds while the equivalent one in R takes 0.13 seconds. Any suggestion for speeding up my python code?
Python script:
import numpy as np
import pandas as pd
import xarray as xr
from geopy.distance import geodesic
from timeit import default_timer as timer
df = pd.DataFrame()
city_coord_orig = (4.351749, 50.845701)
city_coord_orig_r = tuple(reversed(city_coord_orig))
N = 100000
np.random.normal()
df['or'] = [city_coord_orig_r] * N
df['new'] = df.apply(lambda x: (x['or'][0] + np.random.normal(), x['or'][1] + np.random.normal()), axis=1)
start = timer()
df['d2city2'] = df.apply(lambda x: geodesic(x['or'], x['new']).km, axis=1)
end = timer()
print(end - start)
R script
# clean up
rm(list = ls())
# read libraries
library(geosphere)
city.coord.orig <- c(4.351749, 50.845701)
N<-100000
many <- data.frame(x=rep(city.coord.orig[1], N) + rnorm(N),
y=rep(city.coord.orig[2], N) + rnorm(N))
city.coord.orig <- c(4.351749, 50.845701)
start_time <- Sys.time()
many$d2city <- distGeo(city.coord.orig, many[,c("x","y")])
end_time <- Sys.time()
end_time - start_time