[Feature Pyramid in CV] Feature Pyramid Network

Before the rise of deep learning, many traditional methods used image pyramids. The image pyramid, as shown above, is to adjust the image to different sizes, and then get the features of the corresponding sizes respectively, and then make predictions. Although this method can solve the multi-scale problem to a certain extent, it obviously has a large amount of calculation.

The above figure uses a single feature map for detection. This structure was used by many people when 17, such as YOLOv 1, YOLOv2 and faster R-CNN. The direct use of this architecture leads to a single feature scale in the prediction layer and poor detection effect on small targets. Using multi-scale training in ps: YOLOv2 can alleviate the problem of single scale to some extent and make the model adapt to more input scales.

The above picture makes predictions on feature maps of different sizes, which has the ability of multi-scale prediction, but there is no fusion between features. The classic target detection architecture that follows this architecture is SSD, which uses a large number of scales for detection.

There are also very classic FPN buildings. FPN can be easily applied to two-level networks, such as faster R-CNN or first-level networks YOLO and SSD. By constructing a unique feature pyramid, FPN can avoid the problem of high computation in the image pyramid, and at the same time, it can better handle the multi-scale changes in target detection, and the effect can reach the STOA at that time. An improved version of SSD DSSD uses FPN, which has achieved better results than SSD.

The following is a detailed description of FPN:

For the convenience of explanation, the following provisions are made:

Assuming that the current layer is the third layer and needs to be fused with the features of, then the number of channels and layers is constrained by convolution to achieve consistency; The sum of the sizes of the feature maps obtained by two times of up-sampling is consistent, and finally the results are obtained by up-sampling the results and performing element addition.

So why is the effect of FPN after fusion better than that of pyramid feature level? There are several reasons:

The black box is the theoretical receptive field, and the highlight of the central Gaussian distribution is the actual receptive field. The top-down path of FPN integrates different receptive fields and two practical receptive fields with Gaussian distribution, so that the high-level layer strengthens the receptive fields corresponding to the low-level layer. (ps: This part is the author's understanding. If you have different opinions, welcome to discuss and exchange. )

Regarding the idea of receptive field, the FPN paper has a picture, which is very similar to the previously published article, as shown below:

This picture shows the application of FPN in depth mask, such as segmentation, in which a 5×5 multilayer perceptron is used to generate the segmentation result of 14× 14. The corresponding light orange region represents the corresponding region of the original image (similar to the theoretical receptive field), and the dark orange region corresponds to the typical target region (similar to the actual receptive field). By observing this image, we can draw several conclusions:

Many authors who explain FPN networks ignore this point. If you are interested in this part, you can find a detailed explanation in the appendix of FPN.

Ablation experiment

The above are the proper names of FPN. The author compares the benefits of top-down path and horizontal connection in detail.

Several conclusions can be drawn from the experiments in the above table:

In this paper, many experiments have been carried out by adding FPN to RPN network and DeepMask structure, and all of them have achieved good results. FPN is indeed a good feature fusion method. Since then, many networks with different architectures have been put forward, which has brought some improvements.

abstract

Originally, this article wanted to finish the FPN architecture series, but it was too long, so it was listed as a small series and dedicated to the design of FPN.

I talked to a group of friends about the design of FPN before. At that time, I was inspired by generate and had the following ideas:

In fact, readers who read more papers may think of ASFF, BiFPN, BiSeNet, Xunlei, etc. These are all better ways to solve this problem. After that, I will continue to interpret the design of FPN in these networks, and welcome everyone to exchange ideas.

Overview of feature fusion methods

/@ Jonathan hui/understanding-feature-pyramid-networks-for-object-detection-fpn-45b 227 b 9 106 c

/question/3062 13462

https://arxiv.org/abs/ 16 12.03 144

How can I write a short article about science and technology?

Toothpaste box racing technical essay 400 words

How many words does an accounting paper need?

COVID-19 research paper

A paper on emotion

Ask for a paper entitled "Modern Corporate Image Construction and Corporate Social Responsibility"

Is it okay to repeat your words many times in the paper?

In order to satisfy his appetite, he was caught illegally hunting wild animals. What will be the sentence for illegal hunting?

On the advantages and disadvantages of science and technology

Why Prince Ma Boyong?