ResNet: Application in Visual Tracking

SiamFC tracking method has achieved great success and promoted the development of deep learning in tracking field. We know that the backbone network adopted by SiamFC is AlexNet, which is used to extract image features. AlexNet was put forward in the task of image recognition for the first time, which proved the effectiveness of convolution network in CV field for the first time, and won the first place in the 20 12 ImageNet competition. Since then, many deep convolution networks have been proposed, such as VVG, Google and Les. It can be seen that from AlexNet to ResNet, there are more and more layers of the network, that is, the depth of the network is getting deeper and deeper, which also makes the performance of the network stronger and better. From this, we can naturally wonder whether the performance of Siam tracking method can also be improved by using a stronger backbone. This paper mainly discusses the backbone network ResNet in SiamRPN++.

It can be seen that the residual block contains two kinds of mappings, one is the identity mapping, which refers to the curve in the above figure, and the other is the residual mapping, which refers to the part outside the curve, so the final output is y = f (x)+X)+x. As the name implies, the identity mapping refers to itself, which is X in the formula, and the residual mapping refers to "difference", which is Y? X, so the residual refers to the F(x) part. So the F(x) that the network needs to learn is the difference between the input and the target, so it is called residual network.

The original ResNet is mainly used for image classification and recognition tasks, and it is insensitive to spatial information. In the tracking task, spatial information is very important for the accurate positioning of the target, so it needs to be improved before it can be used in the tracking task.

The above picture shows the network structure diagram of SiamRPN++, and its backbone is the reformed ResNet-50. The original ResNet-50 has a stride of 32, which is not suitable for tracking. The author modified the step distance of the last two blocks, reduced the total step distance to 8, and increased the receptive field through hole convolution. As can be seen from the above figure, the characteristics of different depth convolution layers of ResNet are adopted, and an additional convolution layer of 1× 1 is added to the output of each block, reducing the number of characteristic channels to 256. The article retains all padding layers.

How to do well the management of mechanical and electrical equipment in coal mine

Draft contract for journal of pesticide science

What are the common problems when choosing a topic?

How to write the online shopping questionnaire report

Is there a master's thesis in Daya checklist?

When can I reduce tuition fees? 2022 Nanjing Forestry University

Model essay on expressing views in English

English expression model essay 1

I prefer to teach my English class only in English, because it helps us to improve our listening and speaking ab

The paper is so long.

How to write the limitations and shortcomings of the thesis research?

Writing skills and precautions of academic papers and graduation papers