Combining multiple csv files into one csv file
Asked Answered
C

2

3

I am trying to combine multiple csv files into one, and have tried a number of methods but I am struggling.

I import the data from multiple csv files, and when I compile them together into one csv file, it seems that the first few rows get filled out nicely, but then it starts randomly inputting spaces of variable number in between the rows, and it never finishes filling out the combined csv file, it just seems to continuously get information added to it, which does not make sense to me because I am trying to compile a finite amount of data.

I have already tried writing close statements for the file, and I still get the same result, my designated combined csv file never stops getting data, and it will randomly space the data throughout the file - I just want a normally compiled csv.

Is there an error in my code? Is there any explanation as to why my csv file is behaving this way?

csv_file_list = glob.glob(Dir + '/*.csv') #returns the file list
print (csv_file_list)
with open(Avg_Dir + '.csv','w') as f:
    wf = csv.writer(f, delimiter = ',')
    print (f)
    for files in csv_file_list:
        rd = csv.reader(open(files,'r'),delimiter = ',')
        for row in rd:
            print (row)
            wf.writerow(row)
Congratulant answered 24/5, 2019 at 14:15 Comment(2)
Are you sure the csv files you're trying to combine don't have empty space at the end of them? Also if you have a lot of files with a lot of lines maybe it's just taking a long time to run, which is why it seems like it never stops getting data.Peay
@Peay the files don't have empty space at the end, and the combined file should be about 46,000 KB, however when I run the program is sometimes even hits 5,000,000 KB before I terminate the program because I know something is wrongCongratulant
M
3

Your code works for me.

Alternatively, you can merge files as follows:

csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
    for file in csv_file_list:
        with open(file) as rf:
            for line in rf:
                if line.strip(): # if line is not empty
                    if not line.endswith("\n"):
                        line+="\n"
                    wf.write(line)

Or, if the files are not too large, you can read each file at once. But in this case all empty lines an headers will be copied:

csv_file_list = glob.glob(Dir + '/*.csv')
with open(Avg_Dir + '.csv','w') as wf:
    for file in csv_file_list:
        with open(file) as rf:
            wf.write(rf.read().strip()+"\n")
Mascot answered 24/5, 2019 at 15:6 Comment(9)
This resolves the issue I had with the spacing, thank you! Now the data is all nicely formatted together - but one issue still remains: it never seems to finish filling the combined csv file: it is only combined supposed to be 46,000 kilobytes, but it never stops growing as a file, so I'm confused. Do you happen to have any idea of why this is?Congratulant
how do you limit the combined file? without your data it is quite complicated to answer your questionMascot
This is the only block of code, there are no other processes you can't see other than what I link to Dir and Avg_Dir. Dir is linked to a file containing mutliple csv files (that's it) and Avg_Dir links to a file with a single csv file that is being used as the csv file with all of the combined data @ParfaitCongratulant
^i also responded to your question i think @ArayCongratulant
Next question: how are you running the Python script? Through an IDE, at command line, web notebook? Try running at command line to avoid environment issues: python myscript.py or "C:\path\to\bin\python.exe" "C:\path\to\myscript.py".Demarcate
The only idea I have is you did not reach the end of your data. Just try to count lines of all your files manually and compare it in the code (create counter)Mascot
@Demarcate I am running it through PycharmCongratulant
You also can read each file at once. I've updated my answer. If it helps don't forget to accept the answer. Cheers!Mascot
If you're combining csv files with the same header, use the first approach to enumerate over csv_file_list with for i, file in enumerate(csv_file_list): ... and skip the header after the first csv file in the list by putting if i != 0: next(rf) before the second loop.Rudderhead
D
2

Consider several adjustments:

  1. Use context manager, with, for both the read and write process. This avoids the need to close() file objects which you do not do on the read objects.
  2. For skipping lines issue: use either the argument newline='' in open() or lineterminator="\n" argument in csv.writer(). See SO answers for former and latter.
  3. Use os.path.join() to properly concatenate folder and file paths. This method is os-agnostic so accounts for Windows or Unix machines using forward or backslashes types.

Adjusted script:

import os
import csv, glob

Dir = r"C:\Path\To\Source"
Avg_Dir = r"C:\Path\To\Destination\Output"

csv_file_list = glob.glob(os.path.join(Dir, '*.csv')) # returns the file list
print (csv_file_list)

with open(os.path.join(Avg_Dir, 'Output.csv'), 'w', newline='') as f:
    wf = csv.writer(f, lineterminator='\n')

    for files in csv_file_list:
        with open(files, 'r') as r: 
            next(r)                   # SKIP HEADERS
            rr = csv.reader(r)
            for row in rr:
                wf.writerow(row)
Demarcate answered 24/5, 2019 at 15:36 Comment(8)
Thank you, this helps a lot with compiling the data and getting rid of headers - I still have an issue with how my combined csv file never stops adding data. I don't think this is an issue of the code - I think it might be an issue of how our csv files are saved to the computer, I am going to look into this. But thank you, your code definitely helped a lotCongratulant
This solution works great on my end using 10 csv files of 50 rows to output 1 csv of 500 rows on a Windows machine. Hope you can find your issue. Good luck!Demarcate
Just ran the command and didn't get any output using 16 csv files with a total of 1,25 GBs on my Windows machine.Untraveled
Console output or the single CSV output? Did print output list of csv files?Demarcate
The single CSV output. The list was printed, yesUntraveled
Now the output appears, just not in the right folder.Untraveled
Simply add path in file name of open: with open(os.path.join(mydestinationfolder, 'Output.csv'), 'w', newline='').Demarcate
Thank you @Demarcate , like that is perfect!Untraveled

© 2022 - 2024 — McMap. All rights reserved.