Pandas to_csv() checking for overwrite
Asked Answered
I

4

41

When I am analyzing data, I save my dataframes into a csv-file and use pd.to_csv() for that. However, the function (over)writes the new file, without checking whether there exists one with the same name. Is there a way to check whether the file already exists, and if so, ask for a new filename?

I know I can add the system's datetime to the filename, which will prevent any overwriting, but I would like to know when I made the mistake.

Intendment answered 2/11, 2016 at 8:28 Comment(3)
Feedback on how I can improve the question is welcome. Could the voter explain his down-vote please? I'll gladly make some adjustments.Intendment
I'm not the one who downvoted you, but I would guess its because the answer would have likely come up from a google search?Brownie
Unfortunately it did not, but I must say that I was searching for a Pandas built-in or something. Had not thought about a simple if-statement.Intendment
S
13

Try the following:

import glob
import pandas as pd

# Give the filename you wish to save the file to
filename = 'Your_filename.csv'

# Use this function to search for any files which match your filename
files_present = glob.glob(filename)


# if no matching files, write to csv, if there are matching files, print statement
if not files_present:
    pd.to_csv(filename)
else:
    print 'WARNING: This file already exists!' 

I have not tested this but it has been lifted and compiled from some previous code which I have written. This will simply STOP files overwriting others. N.B. you will have to change the filename variable yourself to then save the file, or use some datetime variable as you suggested. I hope this helps in some way.

Snowcap answered 2/11, 2016 at 8:44 Comment(2)
Thank you so much. That is a pretty easy solution :)Intendment
os.path.exists() is a simpler way to check whether a path exists. But this approach is a classic source of Time of Check to Time of Use bugs. Try using pd.to_csv(filename, mode='x') which will raise an exception if the target file exists.Hammerfest
G
15

For 3.3+ use mode='x'

From the docs:

open for exclusive creation, failing if the file already exists

try:
    df.to_csv('abc.csv', mode='x')
except FileExistsError:
    df.to_csv('unique_name.csv')
Gadid answered 28/11, 2022 at 10:32 Comment(0)
S
13

Try the following:

import glob
import pandas as pd

# Give the filename you wish to save the file to
filename = 'Your_filename.csv'

# Use this function to search for any files which match your filename
files_present = glob.glob(filename)


# if no matching files, write to csv, if there are matching files, print statement
if not files_present:
    pd.to_csv(filename)
else:
    print 'WARNING: This file already exists!' 

I have not tested this but it has been lifted and compiled from some previous code which I have written. This will simply STOP files overwriting others. N.B. you will have to change the filename variable yourself to then save the file, or use some datetime variable as you suggested. I hope this helps in some way.

Snowcap answered 2/11, 2016 at 8:44 Comment(2)
Thank you so much. That is a pretty easy solution :)Intendment
os.path.exists() is a simpler way to check whether a path exists. But this approach is a classic source of Time of Check to Time of Use bugs. Try using pd.to_csv(filename, mode='x') which will raise an exception if the target file exists.Hammerfest
I
5

Based on TaylorDay's suggestion I made some adjustments to the function. With the following code you are asked whether you would like to overwrite an existing file. If not, you are allowed to type in another name. Then, the same write-function is called, which will again check whether the new_filename exists.

from os import path
import pandas as pd
def write_csv_df(path, filename, df):
    # Give the filename you wish to save the file to
    pathfile = os.path.normpath(os.path.join(path,filename))

    # Use this function to search for any files which match your filename
    files_present = os.path.isfile(pathfile) 
    # if no matching files, write to csv, if there are matching files, print statement
    if not files_present:
        df.to_csv(pathfile, sep=';')
    else:
        overwrite = raw_input("WARNING: " + pathfile + " already exists! Do you want to overwrite <y/n>? \n ")
        if overwrite == 'y':
            df.to_csv(pathfile, sep=';')
        elif overwrite == 'n':
            new_filename = raw_input("Type new filename: \n ")
            write_csv_df(path,new_filename,df)
        else:
            print "Not a valid input. Data is NOT saved!\n"
Intendment answered 4/11, 2016 at 8:13 Comment(0)
B
0

os.path.isfile() returns a Boolean indicating if the file already exists on your system. If the file does exist, you can create a new file.

import os
if os.path.isfile(file_path):
    # make new file
Brunhilda answered 29/5, 2023 at 17:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.