Overview of receptive field

Receptive field is one of the most important concepts in convolutional neural networks. In order to better understand the structure of convolutional neural networks, it is necessary to design convolutional neural networks.

I. Definition

Perceptual field is defined as the area where the input image can be seen according to the characteristics of convolutional neural network. In other words, the output of features is influenced by pixels in the receptive field.

For example, as shown in the figure below (two dimensions are simplified to one dimension for convenience), the convolution kernel of each layer of this three-layer neural convolution neural network is _ = 3. , = 1, the receptive field corresponding to the topmost feature is 7x7 as shown in the figure.

Second, the calculation method

The receptive field of 1 layer [1]

The second layer is characterized by a receptive field of 5.

Second receptive field [1]

The third layer is characterized by a receptive field of 7.

The third receptive field [1]

If there is an extended conv, the calculation formula is

Third, walk up a flight of stairs.

What is described above is the theoretical receptive field, and the effective receptive field (actual effective receptive field) of the feature is actually much smaller than the theoretical receptive field, as shown in the following figure. The specific mathematical analysis is more complicated, so I won't go into details. If you are interested, please refer to paper [2].

Examples of effective receptive fields [2]

Two-layer 3x3 conv calculation flow chart

Fourth, application

classify

Cao Xudong wrote a technical report entitled "Practical Theory of Designing Extremely Deep Convolutional Neural Networks", which mentioned that when designing an image classifier based on deep convolution neural networks, two conditions need to be met to ensure good results:

First of all, for each convolution layer, its ability to learn more complex patterns should be guaranteed; Secondly, the top receptive field should not be larger than the image area.

The second condition is the limitation of the size of the characteristic receptive field of the highest level network of convolutional neural network.

object detection

Nowadays, most popular target detection networks are based on anchor, such as SSD series, yolo after v2 and faster rcnn series.

The target detection network based on anchor points will preset a set of anchor points with different sizes, such as 32x32, 64x64, 128x 128, 256x256. So many anchors, which floor should I put them on? At this time, the size of the site is an important consideration.

The characteristic receptive field of the anchor layer should match the size of the anchor. It is not good to feel that the wild is too big than the anchor, and it is not good to feel too small. If you think the wild one is much smaller than the anchor, it's like giving you only one foot to tell you what kind of bird it is. If you think the wild one is much bigger than the anchor, it's like giving you a map of the world to show you where the Forbidden City is.

The paper "S3FD: One-shot Scale Invariant Face Detector" is an example of designing anchor point size according to receptive field. The original words in the article are as follows

We design the anchor scale according to the effective receptive field.

Face box: the high-precision CPU real-time face detector is based on the same receptive field when designing multi-scale anchors. One contribution of this paper is that

We introduced the multi-scale convolution layer.

(MSCL) deals with various facial proportions through abundance.

Discrete anchors on receptive fields and layers

Quote:

[1] Convolutional neural network

[2] Understand the effective receptive field in deep convolutional neural networks.