I'm trying to use Keras with TensorFlow to train a network based on the SURF features that I obtained from several images. I have all this features stored in a CSV file that has the following columns:
[ID, Code, PointX, PointY, Desc1, ..., Desc64]
The "ID" column is an autoincremental index created by pandas when I store all the values. The "Code" column is the label of the point, this would be just a number that I got by pairing the actual code (which is a string) with a number. "PointX/Y" are the coordinates of the point found in an image of a given class, and "Desc#" is the float value of the corresponding descriptor of that point.
The CSV file contains all the KeyPoints and Descriptors found in all 20.000 images. This gives me a total size of almost 60GB in disk, which I obviously can't fit into memory.
I've been trying to load batches of the file using pandas, then put all the values in a numpy array, and then fitting my model (a Sequential model of only 3 layers). I've used the following code to do so:
chunksize = 10 ** 6
for chunk in pd.read_csv("surf_kps.csv", chunksize=chunksize):
dataset_chunk = chunk.to_numpy(dtype=np.float32, copy=False)
print(dataset_chunk)
# Divide dataset in data and labels
X = dataset_chunk[:,9:]
Y = dataset_chunk[:,1]
# Train model
model.fit(x=X,y=Y,batch_size=200,epochs=20)
# Evaluate model
scores = model.evaluate(X, Y)
print("\n%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
This is alright with the first chunk loaded, but when the loop gets another chunk, accuracy and loss stuck on 0.
Is it wrong the way I'm trying to load all this information?
Thanks in advance!
------ EDIT ------
Ok, now I made a simple generator like this:
def read_csv(filename):
with open(filename, 'r') as f:
for line in f.readlines():
record = line.rstrip().split(',')
features = [np.float32(n) for n in record[9:73]]
label = int(record[1])
print("features: ",type(features[0]), " ", type(label))
yield np.array(features), label
and use fit_generator with it:
tf_ds = read_csv("mini_surf_kps.csv")
model.fit_generator(tf_ds,steps_per_epoch=1000,epochs=20)
I don't know why, but I keep getting an error just before the first epoch starts:
ValueError: Error when checking input: expected dense_input to have shape (64,) but got array with shape (1,)
The first layer of the model has input_dim=64
and the shape of the features array yielded is also 64.