Python/Numpy - Save Array with Column AND Row Titles
Asked Answered
H

1

8

I want to save a 2D array to a CSV file with row and column "header" information (like a table). I know that I could use the header argument to numpy.savetxt to save the column names, but is there any easy way to also include some other array (or list) as the first column of data (like row titles)?

Below is an example of how I currently do it. Is there a better way to include those row titles, perhaps some trick with savetxt I'm unaware of?

import csv
import numpy as np

data = np.arange(12).reshape(3,4)
# Add a '' for the first column because the row titles go there...
cols = ['', 'col1', 'col2', 'col3', 'col4']
rows = ['row1', 'row2', 'row3']

with open('test.csv', 'wb') as f:
   writer = csv.writer(f)
   writer.writerow(cols)
   for row_title, data_row in zip(rows, data):
      writer.writerow([row_title] + data_row.tolist())
Hentrich answered 28/3, 2012 at 17:26 Comment(0)
D
7

Maybe you'd prefer to do something like this:

# Column of row titles
rows = np.array(['row1', 'row2', 'row3'], dtype='|S20')[:, np.newaxis]
with open('test.csv', 'w') as f:
    np.savetxt(f, np.hstack((rows, data)), delimiter=', ', fmt='%s')

This is implicitly converting data to an array of strings, and takes about 200 ms for every million items in my computer.

The dtype '|S20' means strings of twenty characters. If it's too low, your numbers will get chopped:

>>> np.asarray([123], dtype='|S2')
array(['12'], 
  dtype='|S2')

Another option, that from my limited testing is slower, but gives you a lot more control and doesn't have the chopping issue would be using np.char.mod, like

# Column of row titles
rows = np.array(['row1', 'row2', 'row3'])[:, np.newaxis]
str_data = np.char.mod("%10.6f", data)
with open('test.csv', 'w') as f:
    np.savetxt(f, np.hstack((rows, str_data)), delimiter=', ', fmt='%s')
Dampproof answered 29/3, 2012 at 16:50 Comment(3)
The use of hstack has to re-create the array in memory though, right? So if data is very large, then we must re-allocate that memory again. For my specific application, that is unlikely to be any real issue, but just a point worth mentioning. And there probably isn't any way around this. It seems like something that savetxt should implement internally, even if it must do a solution similar to mine (but in the underlying C code).Hentrich
Yes, you're right. I think that maybe all this overhead could be avoided with a record array, and using the fact that fmt accepts a list of formatting operators, like fmt=['%s', '%f',...], but I'm not familiar with them, and so this is just a guess.Dampproof
Ya, I did consider a record array as well. I think you're right, that it could be used...but I was hoping to avoid them...I guess I'll just choose whichever seems to be the lesser of two evils.Hentrich

© 2022 - 2024 — McMap. All rights reserved.