The most memory-efficient method to format the cells after parsing is through generators. Something like:
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
yield (cell.strip() for cell in row)
But it may be worth moving it to a function that you can use to keep munging and to avoid forthcoming iterations. For instance:
nulls = {'NULL', 'null', 'None', ''}
def clean(reader):
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield clean(row)
Or it can be used to factorize a class:
def factory(reader):
fields = next(reader)
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield dict(zip(fields, clean(row)))
reader = csv.reader(f, skipinitialspace=True,delimiter=',', quoting=csv.QUOTE_NONE)
, right? – Wealthy