PyBrain: Loading data with numpy.loadtxt?

C

2

5

I have some working code which correctly loads data from a csv file into a PyBrain Dataset:

def old_get_dataset():

    reader = csv.reader(open('test.csv', 'rb'))

    header = reader.next()
    fields = dict(zip(header, range(len(header))))
    print header

    # assume last field in csv is single target variable
    # and all other fields are input variables
    dataset = SupervisedDataSet(len(fields) - 1, 1)

    for row in reader:
        #print row[:-1]
        #print row[-1]
        dataset.addSample(row[:-1], row[-1])

    return dataset

Now I'm trying to rewrite this code to use numpy's loadtxt function instead. I believe addSample can take numpy arrays rather than having to add the data one row at a time.

Assuming my loaded numpy array is m x n dimensional, how to I pass in the first m x (n-1) set of data as the first parameter, and the last column of data as the second parameter? This is what I'm trying:

def get_dataset():

    array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1)

    # assume last field in csv is single target variable
    # and all other fields are input variables
    number_of_columns = array.shape[1]
    dataset = SupervisedDataSet(number_of_columns - 1, 1)

    #print array[0]
    #print array[:,:-1]
    #print array[:,-1]
    dataset.addSample(array[:,:-1], array[:,-1])

    return dataset

But I'm getting the following error:

Traceback (most recent call last):
  File "C:\test.py", line 109, in <module>
    (d, n, t) = main()
  File "C:\test.py", line 87, in main
    ds = get_dataset()
  File "C:\test.py", line 45, in get_dataset
    dataset.addSample(array[:,:-1], array[:,-1])
  File "C:\Python27\lib\site-packages\pybrain\datasets\supervised.py",
       line 45, in addSample self.appendLinked(inp, target)
  File "C:\Python27\lib\site-packages\pybrain\datasets\dataset.py",
       line 215, in appendLinked self._appendUnlinked(l, args[i])
  File "C:\Python27\lib\site-packages\pybrain\datasets\dataset.py",
       line 197, in _appendUnlinked self.data[label][self.endmarker[label], :] = row
ValueError: output operand requires a reduction, but reduction is not enabled

How can I fix this?

Cia answered 12/4, 2012 at 23:43 Comment(1)

I think the problem might be related to addSample() expecting a 2-dimensional array for both parameters, however I'm passing in a 1-dimensional array. I'm a little confused as to how to make the target array 2-dimensional as there is only a single target variable per training example. – Cia 13/4, 2012 at 0:28

C

8

After a lot of experimenting and re-reading the dataset documentation, the following runs without error:

def get_dataset():

    array = numpy.loadtxt('test.csv', delimiter=',', skiprows=1)

    # assume last field in csv is single target variable
    # and all other fields are input variables
    number_of_columns = array.shape[1]
    dataset = SupervisedDataSet(number_of_columns - 1, 1)

    print array[0]
    #print array[:,:-1]
    #print array[:,-1]
    #dataset.addSample(array[:,:-1], array[:,-1])
    #dataset.addSample(array[:,:-1], array[:,-2:-1])
    dataset.setField('input', array[:,:-1])
    dataset.setField('target', array[:,-1:])

    return dataset

I have to double check that it's doing the right thing.

Cia answered 13/4, 2012 at 2:23 Comment(0)

M

0

I've written a little function to do this

def load_csv(filename, cols, sep = ',', skip = 0):
    from numpy import loadtxt
    data = loadtxt(filename, delimiter = sep, usecols = cols, skiprows = skip)
    return data

Mcnally answered 4/12, 2013 at 18:28 Comment(0)

Recommended topics

Hot tags