A better model than WGAN (WGAN-GP)

paper

Open source code base

WGAN-GP is an improved version after WGAN, which mainly improves the conditions of continuity restriction, because the author also found that this situation appeared after cutting the weights to a certain range, such as [-0.0 1, +0.0 1], and found that most of the decentralization values were -0.0 1 and 0.065433. Moreover, it is found that forced shear weight is easy to cause gradient disappearance or gradient explosion. It is easy to understand that the gradient disappears, that is, the weights cannot be updated, and the gradient bursts too violently, and the weights change greatly every time they are updated, which easily leads to unstable training. The reason of gradient disappearance and gradient explosion lies in the choice of shear range. If it is too small, the gradient will disappear. If it is set larger, the gradient will become larger every time it passes through a layer of network, and gradient explosion will occur after multiple layers. In order to solve this problem and find a suitable method to satisfy lipschitz continuity condition, the author proposes to use gradient penalty function to satisfy this continuity condition.

Gradient penalty means that the gradient of the discriminator cannot exceed k due to Lipschitz constraint, and this requirement can be met by establishing a loss function, that is, first find the gradient d(D(x)) of the discriminator, and then establish a norm between it and k, so that a simple loss function design can be realized. However, note that the numerical space of the gradient of d is the whole sample space. For data sets such as pictures (including both real data sets and generated image sets), the dimension is extremely high, which is obviously not suitable for calculation. The author points out that it is not necessary to sample the whole data set (real and generated), but only from each batch of samples. For example, random numbers can be generated and interpolated between the generated data and real data, thus solving the trouble of sampling in the whole sample space.

So the contribution of WGAN-GP is:

A new lipschitz continuity constraint technique-gradient penalty is proposed, which solves the problem of gradient explosion when the training gradient disappears.

◆ Compared with standard WGAN, it has faster convergence speed and can generate higher quality samples.

◆ It provides a stable training method for GAN, almost without parameter adjustment, and successfully trains a variety of GAN architectures for image generation and language model.

However, the paper points out that because each sample in each batch adopts gradient penalty (the dimension of random number is (batchsize, 1)), batch norm can not be used in the discriminator, but other normalization methods can be used. For example, this paper uses layer normalization, weight normalization and instance normalization, and the effect of weight normalization is also possible.

This article was cited in CSDN.

Edit Lornatang

Calibrate Lornatang

What does the paper mean?

The development history, present situation and future development prospect of embedded system

Paper numbering style size

Countermeasures of physics experiment teaching reform in vocational high schools

It is reported that 14-year-old girl brave gangsters caused controversy: should it be brave? Write a debate paper for more than 3 times.

What kind of buddhas did the four people in The Journey to the West become?

Contradictory erosion paper

Hongqiao court documents

What if there are not enough words in the first draft of the paper?

Xichang senior high school entrance examination 2023 total score