I have a Python script that imports a large CSV file and then counts the number of occurrences of each word in the file, then exports the counts to another CSV file.
But what is happening is that once that counting part is finished and the exporting begins it says Killed
in the terminal.
I don't think this is a memory problem (if it was I assume I would be getting a memory error and not Killed
).
Could it be that the process is taking too long? If so, is there a way to extend the time-out period so I can avoid this?
Here is the code:
csv.field_size_limit(sys.maxsize)
counter={}
with open("/home/alex/Documents/version2/cooccur_list.csv",'rb') as file_name:
reader=csv.reader(file_name)
for row in reader:
if len(row)>1:
pair=row[0]+' '+row[1]
if pair in counter:
counter[pair]+=1
else:
counter[pair]=1
print 'finished counting'
writer = csv.writer(open('/home/alex/Documents/version2/dict.csv', 'wb'))
for key, value in counter.items():
writer.writerow([key, value])
And the Killed
happens after finished counting
has printed, and the full message is:
killed (program exited with code: 137)
killed
message comes from, but if it is due to going over some kind of system memory limit, you might be able to fix that by usingcounter.iteritems()
instead ofcounter.items()
in your final loop. In Python 2,items
returns a list of the keys and values in the dictionary, which might require a lot of memory if it is very large. In contrast,iteritems
is a generator that only requires a small amount of memory at any given time. – Mullet