? Targets with arbitrary directions are everywhere in the detection data set. Compared with horizontal target detection, rotating target detection is still in its infancy. At present, most SOTA research focuses on the rotation angle of regression target, but solving the rotation angle brings new problems: I) the index is inconsistent with the loss. Ii) The regression interval of rotation angle is discontinuous. 3) Square problem. In fact, there is no good solution to the above problems, which will greatly affect the performance of the model, especially when the angle is at the boundary of the range.
? In order to solve the above problems, this paper proposes the GWD method. Firstly, the rotating target is modeled by two-dimensional Gaussian distribution, and then the nondifferentiable rotating IoU is replaced by the high Schwarzschild distance (GWD), and the loss value is calculated according to GWD, so that the model training is aligned with the measurement standard.
? The main contributions of this paper are as follows:
? Figure 2 gives two definitions of rotating bbox: OpenCV form and long side form, in which the angle of the former is the angle with the abscissa, and the angle of the latter is the angle between the long side and the abscissa. These two definitions can be converted to each other (regardless of the center point):
? The main difference between these two representations lies in the order and angle of the edges. The same bbox is expressed in different ways, and it may be necessary to exchange the order or angle of the sides by 90. In many studies, the design of the model is coupled with the definition of bbox to avoid specific problems: for example, the square problem can be avoided and the edge exchange problem can be avoided.
? IoU is an important evaluation index in detection field, but the regression loss function (such as-norm) used in actual training is often inconsistent with the evaluation index, that is, the smaller the loss value does not mean the higher the performance. At present, there have been some measures to deal with the inconsistency in the field of horizontal target detection, such as Ou Di and Gio. In the field of rotating target detection, due to the addition of angle regression, the inconsistency problem is more prominent, but there is still no good solution. This paper also lists some examples to compare IoU loss and smooth L 1 loss:
? From the above analysis, we can see that in the field of rotating target detection, IoU loss can better fill the difference between evaluation criteria and regression loss. Unfortunately, in the field of rotating target detection, the calculation of IoU between two rotating bbox is non-differentiable and cannot be used for training. Therefore, on the basis of Wasserstein distance, this paper puts forward a differentiable loss instead of IoU loss, which can also solve the discontinuous problem and square problem of rotation angle regression interval.
? Case 1-2 in the above figure summarizes the discontinuity of the regression interval of rotation angle. Take case 2 in the form of OpenCV as an example. anchor and GT have two regression methods:
? The above problems usually occur when the angle between the anchor and GT is at the boundary position of the angle range, but when the angle between the anchor and GT is not at the boundary position, way 1 will not produce huge loss value. Therefore, for smooth-L 1, the optimal treatment of boundary angle and non-boundary angle will be too consistent, which will also hinder the training of the model.
? The square problem mainly appears in the detection method using the long side shape. Because the square target has no absolute long side, the long side form is not unique in expressing the square target itself. Take Case3 as an example, there are anchor and GT, and way 1 can be rotated clockwise by a small angle to make its position consistent with GT. However, due to the large angle difference, way 1 will produce high regression loss. So you need to rotate a large angle counterclockwise like way2. The main reason for the square problem is not the PoA and EoE mentioned above, but the inconsistency between the measurement standard and the loss calculation.
? After the above analysis, this paper hopes that the regression loss function of the new rotating target detection method will satisfy the following points:
? At present, most IoU losses can be regarded as a function of distance. Based on this, this paper proposes a new regression loss function based on Wasserstein distance. First, the rotating bbox is transformed into a two-dimensional Gaussian distribution:
? Is the diagonal vector of rotation matrix and eigenvalue. For the sum of any two probability measures in the world, the Wasserstein distance can be expressed as:
? Formula 2 calculates all random vector combinations, substitutes them into Gaussian distribution, and converts them into:
? Please pay special attention to:
? Considering the exchangeable case (horizontal target detection), Equation 3 can be transformed into:
? For Frobenius norm, bbox here is horizontal, and Equation 5 is similar to-norm loss, which shows that Wasserstein distance is consistent with the loss commonly used in horizontal detection tasks and can be used for regression loss. The formula calculation here is complicated, and you can see the references if you are interested.
? In this paper, a nonlinear transformation function is used to map GWD as, and a function similar to IoU loss is obtained:
? The previous figure also describes the loss function curves under different nonlinear functions. It can be seen that Equation 6 is very close to the IoU loss curve, and non-crossing bbox can also be measured. Therefore, equation 6 can obviously satisfy Requirement 1 and Requirement2. We begin to analyze Requirement3, and first give the characteristics of the formula 1:
? According to the feature 1, the GWD loss function is equal to the OpenCV form and the long-side form, that is, the model does not need to be trained with a specific bbox expression. Take Way 1 in case 2 as an example. GT and forecast have the same mean and variance, and the GWD loss function will not output a large loss value. According to characteristics 2 and 3, the way 1 of cases 2 and 3 will not produce a large loss value, so the GWD loss function also meets requirement 3.
? Generally speaking, the advantages of GWD in rotating target detection are as follows:
? In this paper, RetinaNet is used as the basic detector, bbox is represented as OpenCV, and the regression target is defined as:
? The distribution of variables,, represents GT, anchor and prediction results, and the final multi-task loss function is:
? It is an indicator of anchor number, foreground or background, a label for predicting bbox, GT, GT, a prediction label, a superparameter and a focal loss.
? Compare other solutions to specific problems.
? Comparing multiple models on DOTA data set, there are many other experiments in this paper, and you can go and see them if you are interested.
? In this paper, the main problems of rotating target detection are expounded in detail. The rotating regression target is defined as Gaussian distribution, and the distance between Gaussian distributions is measured by Wasserstein distance for training. At present, there are many methods to transform regression into probability distribution function in conventional target detection. This paper has the same effect and is worth reading.
?