Based on the above conclusions, in the process of adopting multi-scale training, we should consider avoiding the minimum and maximum (after multi-scale) adverse effects while ensuring sufficient diversity of targets. Therefore, in the process of multi-scale training, the unqualified proposals and anchors under each input scale are ignored. This paper uses three scales, as shown in the figure, which is larger than the scale span of general multi-scale training.
Part II: Neck (with pyramid structure improvement scheme)
The general FPN network structure is the rightmost structure, and the structure adopted in this paper is
First of all, this method undoubtedly increases the amount of calculation. The advantage is that the features of each layer in the final output are not linear transformation (it should be said that the features of one layer are not directly transferred to the features of another layer), but multi-layer features shared by * * * are used. In the end, it will be improved by about one point compared with RetinaNet, and the effect is average. The champion team of VisDrone2020 inspection adopted this structure.
In this paper, a feature pyramid network structure is constructed by using multiple TUM modules. Forward flip provides shallow features, middle flip provides medium features, and backward flip provides deep features. In this way, the depth features can be fused many times, with more parameters. Compared with RetinaNet, we can see that the 5 12 input does not use multi-scale reasoning, the map is improved from 33 to 37.6, and the accuracy of small targets is also improved a little; It is not a good method to improve the accuracy of stacking with parameters and calculations.
This paper thinks that the importance of different layers should be related to the absolute scale distribution of the target, so when merging FPN from top to bottom, a scale factor is added to balance the importance of different layers in the pyramid. Personal feelings are of little significance and the actual improvement is not obvious.
Third, the improvement scheme of the head
This "double-headed" scheme has been adopted by the champion scheme on VisDrones and several other schemes. Soft -NMS seems to have improved a few points.
Fourth, small target detection is not good at present, the main reason is not small, but small and close to the background, and the contrast is not high. So we can learn from the idea of camouflage object detection;
Teaching plan for cleaning the table 1
1, don't open the napkin cloth after taking a seat.
2. The napkin cloth should be placed flat