Subtract the average value: the average vector (3 X 1, which is the average value of each color channel) is the average value of pixel values on all training images, and this image is also subtracted from the input image in the testing stage.
Rescaling: Two parameters will be considered here, namely the target size and the maximum size. Adjust the short side (width or height) of the picture to the target size, and then adjust the long side (width or height) to keep the aspect ratio unchanged. However, if the adjusted long side (width or height) exceeds the maximum size, it is necessary to adjust the size of the side to the maximum size, and the short side (width or height) is adjusted according to the original aspect ratio to keep the aspect ratio unchanged. The default values of target size and maximum size are 800 and 1333, respectively.
Edge padding: Because FPN is used, edge padding is necessary. All the padding is only at the far right and bottom, so the target coordinate will not be affected, and the coordinate system starts from the upper left corner. If FPN is not used, you do not need to perform this step.
The width of the picture is the smallest side (600). After it is resized to 800, another height dimension is resized according to the aspect ratio to obtain a new height (1200), but 1200 is not a multiple of 32, so it needs to be filled so that the resulting size is a multiple of 32 (12 16/32
Note: The height and width of the image used in the anchor point generation and convolution steps will be regarded as the adjusted image, not the filled height.