Current location - Education and Training Encyclopedia - Graduation thesis - Summary of classical papers on target detection algorithms (1)
Summary of classical papers on target detection algorithms (1)
Title: Rich Feature Hierarchy for Accurate Object Detection and Semantic Segmentation.

Date of submission: 20 14.

Address: blogs.com/zjutzz/p/8232740.html.

Topic: Attention Network: Gathering Weak Directions for Accurate Object Detection.

Date of submission: 20 15 ICCV

Paper address:/content/pdf/10.1007/978-3-319-10578-9 _ 23.pdf.

To solve this problem:

For example, RCNN will process the input target image blocks into the same size and then input them into CNN network, which will cause the loss of image block information in the process of processing. In the actual scene, it is difficult to unify the target size of the input network, and the final fully connected layer of the network requires that the input feature information be a vector with a unified dimension. The author tries to unify the feature dimensions extracted from CNN networks of different sizes.

Innovation:

In the SPPnet proposed by the author, the final convolution layer output can be unified to the size required by the fully connected layer by using the characteristic pyramid pool. During training, the pooling operation is still completed through the sliding window, and the width, height and step size of the pooling core are calculated by the width and height of the current layer feature map. The operation diagram of the characteristic pyramid pool in the original text is as follows.

Reference blog:/content _ iccv _ 2065438+05/papers/gidaris _ object _ detection _ via _ iccv _ 2065438+05 _ paper.pdf.

To solve this problem:

Since the third paper multibox algorithm proposed that CNN can be used to locate the target to be detected in the input image, the author tried to add some training methods and skills to improve the final positioning accuracy of CNN network.

Innovation:

By processing the input area of the network (through data enhancement, the network can get a more accurate target frame by using the contextual information around the target), the accuracy of the network regression frame can be improved. The specific processing methods include: expanding the label bounding box of the input target, taking a part of the bounding box from the label of the input target, etc. And return to different areas respectively, which makes the network more sensitive to the boundary of the target. This operation enriches the diversity of input targets, thus improving the accuracy of the regression box.

Reference blog:/content _ iccv _ 2065438+05/papers/girsick _ fast _ r-CNN _ iccv _ 2065438+05 _ paper.pdf.

To solve this problem:

CNN in RCNN has to calculate forward every input image block, which is obviously very time-consuming, so how to optimize this part?

Innovation:

Referring to SPPNet (the sixth paper), the author realized ROIpooling in the network, so that the input image blocks do not need to be cut to a uniform size, thus avoiding the loss of input information. Secondly, the whole map is input into the network to get the feature map, and then the target frame obtained by the selective search algorithm on the original map is mapped into the feature map to avoid repeated feature extraction.

See the blog:/content _ iccv _ 2015/papers/harmony _ deep proposal _ hunting _ objects _ iccv _ 2015 _ paper.pdf.

The main questions are:

In this paper, the author observes that CNN can extract excellent papers to represent the input images, and tries to discuss and analyze the functions and situations of features produced by different layers of CNN network through experiments.

Innovation:

The author generates hypotheses on different activation layers by sliding windows. The results show that the final convolution layer can find the object of interest with a high recall rate, but the localization is poor due to the roughness of the feature map. On the contrary, the first layer of the network can better locate the object of interest, but the recall rate is reduced.

Title: Faster r-CNN: Real-time Target Detection with Regional Proposal Network.

Date of submission: 20 15 NIPS

File address: /p/3 1426458