Gan has always had problems such as difficulty in training, poor stability and model collapse. The fundamental reason for this mismatch is that the theoretical mechanism behind GAN has not been explored clearly.
There are two technologies in 20 17 that deserve attention in popularizing GAN applications. One of them is CycleGAN, the essence of which is to optimize the effect of generating pictures by using dual learning combined with GAN mechanism. DualGAN and DiscoGAN, etc. Adopt similar ideas, including many subsequent improved models such as StarGAN. The importance of CycleGAN mainly lies in that the model of GAN series is no longer limited to supervised learning. It introduces unsupervised learning, which only needs to prepare two groups of pictures in different fields, and does not need the one-to-one correspondence between the pictures in the two fields required by the training model, thus greatly expanding its application scope and reducing the difficulty of application and promotion.
Another noteworthy technology is NVIDIA's "Progressive Generation" technology. The attraction of this program is that it enables the computer to generate high-definition pictures with the size of 1024* 1024. At present, it is the technology that achieves the best effect in terms of image clarity and image generation quality, and the generated star pictures can almost achieve the effect of confusing the fake with the real (see Figure 3). NVIDIA's idea of starting from coarse to fine, creating a vague outline of an image and then gradually adding details is not particularly novel. Many schemes such as StackGAN have adopted similar ideas before. Its uniqueness lies in that this coarse-to-fine network structure is dynamically generated, rather than a static network fixed in advance, and more importantly, the generated image is particularly good.
First, there is a generation of generators, which can generate some poor pictures, and then there is a generation of discriminators, which can accurately classify the generated pictures from real pictures. In short, this discriminator is a binary classifier, which outputs 0 for the generated picture and 1 for the real picture.
Then, we began to train the second generation generator, which can generate slightly better pictures and let the first generation discriminator think that these generated pictures are real pictures. Then a second-generation discriminator will be trained, which can accurately identify the real picture and the picture generated by the second-generation generator. And so on, there will be three generations and four generations. . . N-generation generator and discriminator, and finally the discriminator can't distinguish the generated picture from the real picture, so it fits the network.
An example of antagonism is the input of machine learning models, which are deliberately designed by attackers to cause model errors. They are like optical illusions of machines.
Countersample refers to an input sample, which can make the machine learning algorithm output the wrong result after minor adjustment. In image recognition, it can be understood as a picture, which was originally classified into one category (such as "panda") by a convolutional neural network (CNN), and was suddenly misclassified into another category (such as "gibbon") after a very subtle or even imperceptible change.
Confrontation training
Confrontation training is a method to defend against sample attacks. It is an effective regularization method to train the confrontation samples and the normal samples together, which can improve the accuracy of the model and effectively reduce the attack success rate of the confrontation samples. But this defense is only aimed at the method that is also used to generate confrontation samples when training is concentrated.
Indeed, as can be seen in the figure below, the training set is a normal sample and the confrontation sample, the red line of the test set is a normal sample, and the error rate is lower than that of the training set and the test set, which shows that the confrontation training has the function of regularization.
Figure 9
It is inefficient to generate confrontation samples directly during training, and FGSM mentioned above is an efficient confrontation training method. Only by changing the objective function can we train each normal sample and consider the antagonistic sample at the same time. The model is trained to give antagonistic samples the same category as the original classification of normal samples.
The network trained by FGSM can effectively defend against the sample attacks generated by FGSM, but it will also be breached if other countermeasures are used.
The idea of working against samples may have the following two meanings:
conclusion
Because of the gradient method, it will be more difficult to create confrontation samples for the integrated model. But the generated algorithm is more feasible and effective.
Blind spots in a single model can be compensated by other models, and the model data with the best output results will be used.
We find that when we train the algorithm with the dynamically created confrontation model, we can solve the problem of these confrontation samples. This is because the model can produce higher "immunity" in the face of these low probability areas where problems may occur. This also supports the argument of the low probability region, where the confrontation samples are more difficult to deal with.
DCGAN is a better improvement after GAN, mainly in the network structure. Up to now, the network structure of DCGAN is still widely used, which greatly improves the stability of GAN training and the quality of generated results.
The main contributions of this paper are:
◆ It provides a good network topology for the training of GAN.
◆ It shows that the generated features have the computational characteristics of vectors.
D(x) represents the probability that the D network judges whether the real picture is true or not (because X is true, for D, the closer this value is to 1, the better). And D(G(z))D(G(z)) is the probability of judging whether the picture generated by g is true or not.
G's purpose: G should hope that the picture he generates is "as close to reality as possible". That is to say, G hopes that the greater D(G(z))D(G(z)), the better, that is, the smaller V(D, G)V(D, g).
Purpose of D: The stronger the ability of D, the greater the D(x)D(x) and the smaller the D(G(x))D(G(x)). Therefore, the purpose of D is different from that of G. D hopes that the bigger V(D, G)V(D, g) is, the better.
DCGAN has made some changes to the structure of convolutional neural network to improve the sample quality and convergence speed. These changes are as follows:
Cancel all pool layers. G network uses transposed convolution layer for up-sampling, and D network uses convolution with step size instead of pooling.
Use batch normalization in d and g.
Remove the FC layer to make the network a fully rolling network.
ReLU is used as the activation function in G network, and tanh is used in the last layer.
Use LeakyReLU as activation function in D network.
Schematic diagram of network in g. DCGAN:
The generator network structure of DCGAN is shown in the above figure. Compared with the original GAN, DCGAN almost completely uses convolution layer instead of full link layer, and the discriminator is almost symmetrical with the generator. As can be seen from the above figure, the whole network has no pool layer and up-sampling layer. In fact, fractional long convolution is used instead of up-sampling to increase the stability of training.
The main reasons why DCGAN can improve the stability of GAN training are:
◆ Step-length convolution is used to replace the up-sampling layer. Convolution plays a very good role in extracting image features, and convolution is used to replace the fully connected layer.
◆ almost every layer of generator g and discriminator d uses batchnorm layer to normalize the output of feature layer, which accelerates the training and improves the stability of training. (batchnorm is not added to the last layer generator and the first layer discriminator)
◆ leakrelu activation function is used instead of relu in discriminator to prevent gradient sparseness. Relu is still used in the generator, but tanh is used in the output layer.
Training with adam optimizer, the best learning rate is 0.0002 (I have tried other learning rates and have to say that 0.0002 is the best performance).
BigGAN has used a lot in training, which has reached 2048 batches (we usually train in 64 batches), the convolution channel has also become larger and the network parameters have become more. In batch in 2048, the parameters of the whole network are close to 654.38+0.6 billion.