numpy array from csv file for lasagne

... with gzip.open(filename, 'rb') as f: data = pickle_load(f, encoding='latin-1') # The MNIST dataset we have here consists of six numpy arrays: # Inputs and targets for the training set, validation set and test set. X_train, y_train = data[0] X_val, y_val = data[1] X_test, y_test = data[2] ... # We just return all the arrays in order, as expected in main(). # (It doesn't matter how we do this as long as we can read them again.) return X_train, y_train, X_val, y_val, X_test, y_test

You can use numpy.genfromtxt() or numpy.loadtxt() as follows:

from sklearn.cross_validation import KFold

Xy = numpy.genfromtxt('yourfile.csv', delimiter=",")

# the next section provides the required
# training-validation set splitting but 
# you can do it manually too, if you want

skf = KFold(len(Xy))

for train_index, valid_index in skf:
    ind_train, ind_valid = train_index, valid_index
    break

Xy_train, Xy_valid = Xy[ind_train], Xy[ind_valid]

X_train = Xy_train[:, 1:]
y_train = Xy_train[:, 0]

X_valid = Xy_valid[:, 1:]
y_valid = Xy_valid[:, 0]


...

# you can simply ignore the test sets in your case
return X_train, y_train, X_val, y_val #, X_test, y_test

In the code snippet we ignored passing the test set.

Now you can import your dataset to the main modul or script or whatever, but be aware to remove all the test part from that too.

Or alternatively you can simply pass the valid sets as test set:

# you can simply pass the valid sets as `test` set
return X_train, y_train, X_val, y_val, X_val, y_val

In the latter case we don't have to care about the main moduls sections refer to the excepted test set, but as scores (if have) you will get the the validation scores twice i.e. as test scores.

Note: I don't know, which mnist example is that one, but probably, after you prepared your data as above, you have to make further modifications in your trainer module too to suit to your data. For example: input shape of data, output shape i.e. the number of classes e.g. in your case the former is 773, the latter is 2.

Recommended topics

Hot tags