Recent research shows that over-parameterization is very important for the successful training of deep neural networks, and it also introduces a lot of redundancy.
In this paper, we propose a fully parameterized convolutional neural network using a single high-order low-rank tensor.
In this paper, a single high-order tensor is proposed to parameterize the neural network, so as to capture the complete structure of the neural network. The pattern of this high-order tensor represents every architectural design parameter of the network (for example, quantity, depth, quantity stack, input characteristics, etc.). ). This parameterization can standardize the whole network and greatly reduce the number of parameters.
In this paper, the network with rich structure, namely the complete convolutional network (FCN), is studied. It is suggested to use a single 8-order tensor for parameterization.
There is evidence that the important feature behind the success of these depth models is over-parameterization, which helps to find a good local minimum.
But at the same time, excessive parameterization leads to a lot of redundancy, which is difficult to generalize statistically (because of its huge number of parameters), and it also increases the difficulty of storage and calculation, making it difficult to deploy on devices with limited computing resources.
This paper is devoted to eliminating the redundancy of CNN parameters through the joint parameterization of the whole network by Zhang method.
Recently, a lot of work has been done to reduce redundancy and improve the efficiency of CNN, mainly focusing on reparameterizing each layer.
This paper mainly parameterizes the whole CNN with a single high-order tensor, rather than parameterizing each layer with different tensors, which is different from previous work.
In particular, a single high-order tensor is proposed to parameterize the network, and each dimension represents different architecture design parameters of the network.
By using a single tensor to simulate the whole FCN, this method allows learning the correlation between different tensor dimensions, thus completely capturing the structure of the network.
In addition, this parameterization implicitly adjusts the whole network and significantly reduces the number of parameters by applying a low-rank structure to the tensor.
The contribution of this paper lies in:
More relevant to the work in this paper are manual decomposition methods, such as MobileNet [15] and Xception [8], which use effective depth and dot product convolution to decompose 3×3 convolution.
P.S. (Why do you suddenly feel that the previous MobileNet was not searched for nothing, embarrassed ...)
In this paper, tensor hourglass (HG) network is selected mainly because of its rich structure, which makes it suitable for modeling with higher-order tensor. The purpose of this work is not to produce the latest results of human posture estimation task, but to show the benefits of modeling the latest architecture with a single high-order tensor.
Although the second-order tensor is easy
It is described as a rectangle and the third-order tensor is described as a cube, but it is unrealistic to represent the higher-order tensor in this way.
Instead, we use tensor graphs, which are undirected graphs, in which vertices represent tensors.
The degree of each vertex (that is, the number of edges from the circle) specifies the order of the corresponding tensor. Then, the tensor contraction in the two modes is represented by simply linking two edges corresponding to the two modes together.
Fig. 2 depicts the Tucker decomposition of the eighth-order tensor (i.e., the contraction of the core tensor along the factor matrix of each mode) with a tensor diagram.
The higher-order tensor of the proposed tensor network (T-Net) is obtained as follows:
Different low-stop constraints are added to the above parameters to obtain different variants of this method.
Consider Tucker-rank-, and the parameters are:
Compressing each convolution layer separately, the rank of input and output features are respectively, and the total * * * parameters are:
Similarly, the parameters of this method are:
References:
T-Net: Parameterizing Full Convolution Net with a Single Higher-Order Tensor