The article here mentions a smart way to generate triplets for a convolutional neural network (in order to generate face embeddings).
For a mini-batch with n images, only the semi-hard triplets are used for learning (triplets containing semi-hard negatives, which are negative images that are close enough to the anchor image).
- How is the training set created? What does a batch contain?
In our experiments we sample the training data such that around 40 faces are selected per identity per mini- batch. Additionally, randomly sampled negative faces are added to each mini-batch.
What I did
I used Labeled Faces in the Wild dataset for training (13233 images, 5749 people, 1680 people with two or more images, and for each batch I chose one anchor, some positives (meaning that I could only use 1680 batches, because I need more than one image of ONE person), and negatives - images of other people randomly selected.
Something is wrong with my training set. Should a mini-batch contain more anchors?
Instead of picking the hardest positive, we use all anchor- positive pairs in a mini-batch while still selecting the hard negatives
- Online triplet generation? How is it done? (technical details are welcome)
Generate triplets online. This can be done by selecting the hard positive/negative exemplars from within a mini-batch.
To select the semi-hard negatives I need to calculate the embeddings of my triplets. So I need to make a pass trough the triplet neural net, and compare the embeddings. And then, I need to calculate the loss function only using the hard triplets. That's what I think I have to do.
I used three convolutional neural networks with shared parameters (four convolution layers with max pooling and one fully connected layer). I didn't use online triplet generation yet , because I can't understand how it's done. It results in no more than 70% accuracy.