AttributeError using pyBrain _splitWithPortion - object type changed?
Asked Answered
F

6

17

I'm testing out pybrain following the basic classification tutorial here and a different take on it with some more realistic data here. However I receive this error when applying trndata._convertToOneOfMany() with the error:

AttributeError: 'SupervisedDataSet' object has no attribute '_convertToOneOfMany

The data set is created as a classification.ClassificationDataSet object however calling splitWithProportion seems to change it supervised.SupervisedDataSet object, so being fairly new to Python this error doesn't seem such a surprise as the supervised.SupervisedDataSet doesn't have that method, classification.ClassificationDataSet does. Code here.

However the same exact code is used across so many tutorials I feel that I must be missing something as plenty of other people have it working. I've looked at changes to the codebase on github and there's nothing around this function, I've also tried running under Python 3 vs 2.7 but no difference. If anyone has any pointers to get me back on the right path and that would be very much appreciated.

#flatten the 64x64 data in to one dimensional 4096
ds = ClassificationDataSet(4096, 1 , nb_classes=40)
for k in xrange(len(X)): #length of X is 400
    ds.addSample(np.ravel(X[k]),y[k])
    # a new sample consisting of input and target

print(type(ds))      
tstdata, trndata = ds.splitWithProportion( 0.25 )
print(type(trndata))

trndata._convertToOneOfMany()
tstdata._convertToOneOfMany()
Fulford answered 11/1, 2015 at 14:3 Comment(0)
B
19

I had the same problem. I added the following code to make it work on my machine.

tstdata_temp, trndata_temp = alldata.splitWithProportion(0.25)

tstdata = ClassificationDataSet(2, 1, nb_classes=3)
for n in xrange(0, tstdata_temp.getLength()):
    tstdata.addSample( tstdata_temp.getSample(n)[0], tstdata_temp.getSample(n)[1] )

trndata = ClassificationDataSet(2, 1, nb_classes=3)
for n in xrange(0, trndata_temp.getLength()):
    trndata.addSample( trndata_temp.getSample(n)[0], trndata_temp.getSample(n)[1] )

This converts tstdata and trndata back to the ClassificationDataSet type.

Brochure answered 25/1, 2015 at 16:48 Comment(1)
Nice workaround, thanks for posting, I've moved on to try other frameworks in the meantime but will come back and give this a go at some point.Fulford
C
3

The implementation of splitWithProportionchanged between PyBrain versions 0.3.2 and 0.3.3., introducing this bug that breaks polymorphism.
As of now, the library hasn't been updated since January 2015, so using some kind of workaround is the only course of action at the moment.

You can check the responsible commit here: https://github.com/pybrain/pybrain/commit/2f02b8d9e4e9d6edbc135a355ab387048a00f1af

Centipoise answered 16/6, 2015 at 13:40 Comment(2)
Right now I'm using Muhammed Miah's workaround. But I opened an issue here: github.com/pybrain/pybrain/issues/169Centipoise
Please excuse me, I'm new here as an active user. But what would you have me add? Someone changed something in the code with unforeseen consequences. Hopefully, that will be fixed soon. But for now, the only important thing to take away from this is: "Yes, the library has a bug. Use the workaround."Centipoise
M
1

I have the same issue and think I fixed it: See this pull request.

(Python 2.7.6, PyBrain 0.3.3, OS X 10.9.5)

Mistrot answered 19/5, 2015 at 8:50 Comment(0)
S
1

I tried the suggested workaround from Muhammed Miah, but I still was tripped up when running the tutorial at the line:

print( trndata['input'][0], trndata['target'][0], trndata['class'][0])

trndata['class'] was an empty array, so index [0] threw a fault.

I was able to workaround by making my own function ConvertToOneOfMany:

def ConvertToOneOfMany(d,nb_classes,bounds=(0,1)):
  d2 = ClassificationDataSet(d.indim, d.outdim, nb_classes=nb_classes)
  for n in range(d.getLength()):
    d2.addSample( d.getSample(n)[0], d.getSample(n)[1] )
  oldtarg=d.getField('target')
  newtarg=np.zeros([len(d),nb_classes],dtype='Int32')+bounds[0]
  for i in range(len(d)):
    newtarg[i,int(oldtarg[i])]=bounds[1]
  d2.setField('class',oldtarg)
  d2.setField('target',newtarg)
  return(d2)
Scow answered 24/5, 2016 at 0:48 Comment(3)
Do you mind explaining how ConvertToOneOfMany is helpful in the context of this tutorial? It states that "For neural network classification, it is highly advisable to encode classes with one output neuron per class", but I do not really understand that statement.Stylistic
That statement refers to One Hot Encoding link. For example, if you have an MNIST classifier that does handwriting classification of digits 0-9, then the output layer should be 10 neurons, and there is a mapping of each digit to a single one with zeros in the other positions (e.g. 7 = [0,0,0,0,0,0,0,1,0,0])Scow
I am encountering the same index[0] problem when printing trndata['class'][0]. Could you show an example of how this function fixes the issue? Where is the np variable defined? It is not part of the pybrain tutorial discussed in this thread?Funerary
F
0

So, I did the following without getting an error:

from pybrain.datasets import ClassificationDataSet
ds = ClassificationDataSet(4096, 1 , nb_classes=40)
for k in range(400):
    ds.addSample(k,k%4)
print(type(ds))
# <class 'pybrain.datasets.classification.ClassificationDataSet'>
tstdata, trndata = ds.splitWithProportion(0.25)
print(type(trndata))
# <class 'pybrain.datasets.classification.ClassificationDataSet'>
print(type(tstdata))
# <class 'pybrain.datasets.classification.ClassificationDataSet'>
trndata._convertToOneOfMany()
tstdata._convertToOneOfMany()

The only difference I see between my code and yours is your use of X. Perhaps you can confirm that my code works on your machine, and if so then we could look into what about X if confusing things?

Fluke answered 12/1, 2015 at 18:4 Comment(1)
Thanks so much for the idea, unfortunately it still fails so it must be something strange about my set up. I'm using the latest version of pybrain from github (same as you?) and python 2.7.5 (also tried 3) on ubuntu 13.10. I'll try setting up a virtual machine and see if it works there!Fulford
S
0

The simplest workaround that I found was to do first the splitWithProportion(), update the number of classes and then do the _convertToOneOfMany().

tstdata, trndata = alldata.splitWithProportion( 0.25 )
tstdata.nClasses = alldata.nClasses
trndata.nClasses = alldata.nClasses
tstdata._convertToOneOfMany(bounds=[0, 1])
trndata._convertToOneOfMany(bounds=[0, 1])

And with the update of nClasses of both testdata and trndata, it is guarantee that you don't get different dimensions in the target fields.

I was geting errors either if I did first _convertToOneOfMany and second splitWithProportion or the other way around when working with a ClassificationDataSet. So, I suggested and update in the splitWithProportion function. You can see the whole code in this pullRequest.

Shamrock answered 15/10, 2016 at 21:33 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.