I am trying to calculate geodesic distance from a dataframe which consists of four columns of latitude and longitude data with around 3 million rows. I used the apply lambda method to do it but it took 18 minutes to finish the task. Is there a way to use Vectorization with NumPy arrays to speed up the calculation? Thank you for answering.
My code using apply and lambda method:
from geopy import distance
df['geo_dist'] = df.apply(lambda x: distance.distance(
(x['start_latitude'], x['start_longitude']),
(x['end_latitude'], x['end_longitude'])).miles, axis=1)
Updates:
I am trying this code but it gives me the error: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). Appreciate if anyone can help.
df['geo_dist'] = distance.distance(
(df['start_latitude'].values, df['start_longitude'].values),
(df['end_latitude'].values, df['end_longitude'].values)).miles