In order to learn features better, this paper proposes that both recognition and verification are training objectives, that is, the combination of two losses is adopted, so that the learned features can be used not only for classification, but also for judging whether they are the same person. Therefore:
Lost identification is used to judge which category it belongs to, and the probability is calculated by softmax:
The author also uses cosine similarity as follows:
The training algorithm is as follows:
Obviously, the algorithm needs to select two samples at a time.
In this paper, three important factors affecting the effect of neural network are proposed: sparsity, selectivity and robustness.
The author puts forward DeepID2+, as shown below:
In this paper, the commonly used Euclidean distance is questioned. In the common classification tasks, we will process the features, and finally get the score of each category through an fc layer. The class with the highest score is the one we predicted. Suppose a binary classification task, the boundary of classification is:. Including the parameters of the classification layer.
If the normalized length is 1, it will be set to. The boundary becomes:, where is the angle between and. As shown in the figure below:
In this paper, the author normalized the features and parameters of the classification layer (FC), but found that after several rounds of training, the network still did not converge. Therefore, the author discusses the reasons for this phenomenon and puts forward a method to train the network.
The author answered the following questions:
Answer:
In this paper, the author uses the following normalization methods for features and parameters:
The gradient is as follows:
The author studies regularization metric learning, such as contrast loss and triple loss. After regularization, the inner product can be regarded as Euclidean distance, as follows:
// TODO
This loss has some disadvantages, that is, it has no effect on the negative samples within the boundary. As shown in the figure below:
The red dot in the shadow is the point that has not been affected by the loss.
L-softmax designed a new classification layer to improve the performance of feature expression. As shown in the figure below:
In this paper, the author uses neural network (Inception, ResNet, etc.). ) as a feature extractor, and training the output features with triplet loss. This structure does not need a classification layer, and the model will be very small.
Triple loss:
ArcFace's losses are as follows:
There are still many articles about face and metrology that have not been read carefully, and even these introductory articles have not been fully understood. In order to learn the characteristics of discriminant in large-scale face recognition, we need to make a fuss in many places. Even more effective loss, if the parameter selection is not good, it is likely that the effect is not as good as that of ordinary fc.