Current location - Education and Training Encyclopedia - Graduation thesis - The results of open source papers cannot be copied.
The results of open source papers cannot be copied.
Paper: lifting pool: bidirectional convnet pool

[image upload failed ... (image-25ebb9-1656174155789)]

? Spatial pool is a very important operation in convolutional networks. The key is to keep the most important eigenvalues while reducing the resolution, which is convenient for subsequent model identification. Simple pooling operations, such as maximum pooling and average pooling, not only ignore local characteristics during pooling, but also do not support the reverse recovery of lost information. For this reason, this paper proposes a two-way pool layer LiftPool, which includes a down-sampling operation LiftDownPool that preserves detailed features and an up-sampling operation LiftPool that generates fine feature maps.

[image upload failed ... (image-b4f8ff-1656174155789)]

? LiftPool is inspired by the lifting scheme in signal processing, which decomposes the input into several subbands when down-sampling, and can be perfectly recovered in reverse when up-sampling. As shown in figure 1, LiftDownPool generates four sub-bands, where LL sub-band is the input approximation with details removed, while LH, HL and HH contain detailed information in horizontal, vertical and diagonal directions. Users can select one or more subbands as output and keep other subbands for recovery. LiftUpPool recovers the up-sampled input according to the sub-band. Compared with the effect of MaxUpPool, LiftUpPool can produce more detailed output.

? When downsampling feature map, the core of pooling operation is to reduce the information loss caused by downsampling, and the lifting scheme in signal processing can just meet this requirement. The lifting scheme uses the correlation structure of the signal to construct its downsampled approximate signal and several subbands containing detailed information in the spatial domain, which can perfectly reconstruct the input signal during the inverse transformation. Based on the lifting scheme, a two-way pool layer lifting pool is proposed.

? Taking one-dimensional signal as an example, LiftDownPool decomposes it into downsampled approximate signal and differential signal:

[image upload failed ... (image-579f4f-1656174155789)]

? It contains three functions, representing the combination of functions.

[Image upload failed ... (Picture-3E6533-1656174155789)]

? The whole process of LiftDownPool- 1D is shown in Figure 2, including the following steps:

? In fact, the classical lifting scheme is accomplished by low-pass filtering and Qualcomm filtering, and the image is decomposed into four sub-bands by preset filters. But in general, it is difficult to define sum in the form of preset filters. Therefore, Zheng et al. proposed to optimize these filters by back propagation of the network. Borrowing this idea, this paper realizes the sum function in LiftDownPool by 1D convolution and nonlinear activation:

[Image upload failed ... (Picture-3892e3-1656174155789)]

? For better end-to-end training, two constraints need to be added to the final loss function. First of all, it is obtained from change, which is basically similar to that of. Add a regularization term to minimize the L2 norm distance of the sum:

[image upload failed ... (picture-5303c9-1656174155789)]

? In addition, the idea of is to convert to, so adding regularization terms can minimize the difference in details:

[image upload failed ... (image-f2a301-1656174155789)]

? The complete loss function is:

[Image upload failed ... (Picture-947d3-1656174155789)]

? Loss function of a specific task, such as classification or semantic segmentation. Setting can bring good regularization effect to the model.

? LiftDownPool-2D can be decomposed into several LiftDownPool- 1D operations. According to the standard lifting scheme, firstly, LiftDownPool- 1D is executed in the horizontal direction to obtain (horizontal low frequency) and (horizontal high frequency). Subsequently, the two parts are vertically LiftDownPool- 1D, further decomposed into LL (vertical and horizontal low frequency) and LH (vertical and horizontal high frequency), and further decomposed into HL (vertical and horizontal high frequency) and HH (vertical and horizontal high frequency).

? Users can flexibly select one or more subbands as the result and reserve other subbands for recovery. Generally speaking, LiftDownPool- 1D can be further extended to n-dimensional signals.

[image upload failed ... (image-dab924-1656174155789)]

? Fig. 3 shows several characteristic outputs of the first LiftDownPool layer of VGG 13. LL features are smoother and have less details, while LH, HL and HH capture details in horizontal, vertical and diagonal directions, respectively.

? LiftUpPool inherits the reversibility of the lifting scheme. Taking the 1D signal as an example, LiftUpPool can recover the up-sampled signal in the following ways:

[image upload failed ... (image-ad040c-1656174155789)]

? Includes the functions of updating, forecasting and merging, namely:

[Image upload failed ... (Picture-B6C899-1656174155789)]

? Through the above formula, sum, and then get the up-sampling feature map containing rich information.

? Up-sampling is often used for image-to-image conversion, such as semantic segmentation, super-resolution and image coloring. But at present, most pooling operations are irreversible, for example, the output of MaxPool sampling is sparse and most structural information is lost. LiftUpPool can inverse transform the output of LiftDownPool, and produce better output with the help of subbands.

[Image upload failed ... (Picture-321DBE-1656174155789)]

? Take the pool with kernel size =2 and stride =2 as an example. The logic of LiftPool and MaxPool is shown in Figure 6.

? Maxpool will lose 75% information when selecting local maximum as output, which probably contains important information related to image recognition.

? LiftDownPool decomposes the feature map into subbands LL, LH, HL and HH, where LL is the approximate value of the input and the others are details in different directions. LiftDownPool takes all subbands as output, which contains approximate information and detailed information, and can be used for image classification more efficiently.

? MaxPool is irreversible, and Maxpool is executed by subscribing to the maximum value of the record. MaxUpPool corresponds the eigenvalue of the output feature map back to the subscript position, and everything else is zero, so the recovered feature map is very sparse.

? LiftDownPool is reversible. LiftDownPool is restored in reverse according to the attributes of the lifting scheme. LiftDownPool can generate high-quality results with detailed records.

[image upload failed ... (image-425b 53-1656174155789)]

? Compare the effects of subbands and regularization terms on CIFAR- 100.

[Image upload failed ... (Picture-78E714-165174155789)]

? Compare different backbone networks on ImageNet.

[Image upload failed ... (Picture-7FD5F8-1656174155789)]

? Test and compare anti-jamming data sets.

[Image upload failed ... (Picture-761CF4-165174155789)]

? Performance comparison of semantic segmentation on different data sets.

[image upload failed ... (image-ea 410b-1656174155789)]

? Comparison of sampling results in semantic segmentation.

? Referring to the lifting scheme, this paper proposes a two-way pool operation LiftPool, which can not only keep as many details as possible when down sampling, but also restore more details when up sampling. From the experimental results, LiftPool can improve the accuracy and robustness of image classification, and the accuracy of semantic segmentation is also greatly improved. However, at present, the paper is still in the stage of preparing for open source, and I look forward to the reappearance after open source, especially in speed and memory.

?

?

?

?