Siamese Network

Triplet Loss

To learn the parameters for a siamese network so as to generate a good encoding of an input image, we apply gradient descent on the triplet loss function.

For every image (called anchor image A), we consider a positive image P and a negative image N. P will be similar to A and N will not be similar to A (i.e. A and P are pictures of the same person while N is a picture of a different person).

Our aim is to satisfy the following equation:

The triplet loss function is given by:

and the cost function will be:

Note that while training, we must not choose A, P, N randomly. Instead, we must choose A, P, N such that d(A, P) is very close to d(A, N). This will allow the gradient descent to choose parameters that maximize the margin between similar-looking A and N images.

Also, since A and P are images of the same person, we will need multiple images of the same person while training (as opposed to one-shot learning).

Last updated