I have a list of about 1 million addresses, and a function to find their latitudes and longitudes. Since some of the records are improperly formatted (or for whatever reason), sometimes the function is not able to return the latitudes and longitudes of some addresses. This would lead to the for loop breaking. So, for each address whose latitude and longitude is successfully retrieved, I want to write it to the output CSV file. Or, perhaps instead of writing line by line, writing in small chunk sizes would also work. For this, I am using df.to_csv
in "append" mode (mode='a'
) as shown below:
for i in range(len(df)):
place = df['ADDRESS'][i]
try:
lat, lon, res = gmaps_geoencoder(place)
except:
pass
df['Lat'][i] = lat
df['Lon'][i] = lon
df['Result'][i] = res
df.to_csv(output_csv_file,
index=False,
header=False,
mode='a', #append data to csv file
chunksize=chunksize) #size of data to append for each loop
But the problem with this is that, it is printing the whole dataframe for each append. So, for n
lines, it would write the whole dataframe n^2
times. How to fix this?
NaN
or something in the except case, and then just write the entire DataFrame at the end? You can even subset it to where it's not null if you don't want to include the bad data in the csv. – Gorriandf.iloc[i:i+1].to_csv(...)
to write only the single line you are working with if you truly need to do it line by line. – Gorrianlat
,lon
andres
before yourtry
block. – Mastiff