How to write a pandas dataframe to CSV file line by line, one line at a time?
Asked Answered
A

1

11

I have a list of about 1 million addresses, and a function to find their latitudes and longitudes. Since some of the records are improperly formatted (or for whatever reason), sometimes the function is not able to return the latitudes and longitudes of some addresses. This would lead to the for loop breaking. So, for each address whose latitude and longitude is successfully retrieved, I want to write it to the output CSV file. Or, perhaps instead of writing line by line, writing in small chunk sizes would also work. For this, I am using df.to_csv in "append" mode (mode='a') as shown below:

for i in range(len(df)):
    place = df['ADDRESS'][i]
    try:
        lat, lon, res = gmaps_geoencoder(place)
    except:
        pass

    df['Lat'][i] = lat
    df['Lon'][i] = lon
    df['Result'][i] = res

    df.to_csv(output_csv_file,
          index=False,
          header=False,
          mode='a', #append data to csv file
          chunksize=chunksize) #size of data to append for each loop

But the problem with this is that, it is printing the whole dataframe for each append. So, for n lines, it would write the whole dataframe n^2 times. How to fix this?

Avar answered 12/7, 2018 at 3:8 Comment(3)
Why not just assign NaN or something in the except case, and then just write the entire DataFrame at the end? You can even subset it to where it's not null if you don't want to include the bad data in the csv.Gorrian
Regardless, you can use df.iloc[i:i+1].to_csv(...) to write only the single line you are working with if you truly need to do it line by line.Gorrian
just declare default values for lat, lon and res before your try block.Mastiff
C
9

If you really want to print line by line. (You should not).

for i in range(len(df)):
    df.loc[[i]].to_csv(output_csv_file,
        index=False,
        header=False,
        mode='a')
Cacoepy answered 12/7, 2018 at 3:35 Comment(3)
If printing line by line is not a good idea, what would you suggest? Printing in chunks? That's what I was trying in my code, but it was appending the whole dataframe every time, that was the issue.Avar
Save the entire data frame at the end of the loop? And instead of a python loop use something like this: #46799734Cacoepy
But as I explained in the question, if there is some problem, any problem, due to which the code breaks, then the whole time spent until then is basically wasted (for a million records, it would take 6 days to process the entire dataset). I want to print in chunks or line-by-line only to avoid this.Avar

© 2022 - 2024 — McMap. All rights reserved.