Current location - Education and Training Encyclopedia - Graduation thesis - LSGAN: antagonistic network generated by least squares
LSGAN: antagonistic network generated by least squares
Solve the problem: solve the problems of low training quality and instability caused by traditional GAN.

Practice: the least square loss function is used to replace the traditional cross entropy loss function of GAN.

This paper mainly studies lsGAN by comparing gan.

For example:

When the generator is updated with false samples located on the right side of the decision boundary but still far away from the real data, the cross entropy loss function will lead to the problem of gradient disappearance.

As shown in fig. (b), when we use false samples (magenta) to update the generator by convincing the discriminator that they are from real data, almost no error will be caused because they are located on the right side, that is, the real data plane of the decision boundary.

But these samples are far from the real data, and we want to bring them closer to the real data.

Problem summary: In the case of cross entropy loss function, the discriminator determines that the false data of the real surface is far from the real data, and the effect is insufficient.

Based on this observation, we propose the least square generation countermeasure network, which uses the least square loss function as the discriminator.

The least square loss function can move the wrong samples to the decision boundary,

Because the least square loss function will punish the samples far from the right side of the decision boundary.

As shown in figure (c), the least square loss function will punish the false samples (magenta) and pull them to the decision boundary, so that they can be correctly classified.

Based on this characteristic, the least square method can generate samples closer to the real data.

summary

Least squares: Compared with cross entropy loss, the advantage of least squares loss is that the generated samples deceive the discriminator, and at the same time, the generator pulls the generated pictures far from the decision boundary to the decision boundary, thus ensuring the generation of high-quality samples.

Cross entropy: With the loss of cross entropy, the generator will not optimize the generated pictures identified by the discriminator as real pictures, even though these generated pictures are still far away from the decision boundary of the discriminator, that is, from the real data, because the loss of cross entropy at this time is already very small, and the generator has completed the design goal for them.

The defect of LSGAN is that it does not solve the gradient dispersion problem of the generator when the discriminator is excellent enough.

Gradient dispersion: When the gradient is propagated by the back propagation algorithm, with the increase of propagation depth, the gradient amplitude will drop sharply, resulting in the slow updating of the weights of shallow neurons and the inability to learn effectively.

In this way, the depth model becomes a shallow model with the first few layers relatively fixed and only the last few layers can be changed.

Loss function of GANs:

Loss function of LSGANs:

Least square

Formula notes:

Discriminator d

Generator g

The goal of g is to learn the distribution pg on data X.

G samples the input variable z according to a uniform or Gaussian distribution pz(z), and then maps the input variable z to the data space g (z; θg).

D is the classifier d (x; θd), the purpose of which is to identify whether the image is from training data or from G.

Z is noise, which can obey normal or Gaussian distribution. The probability distribution of real data X and the probability distribution of Z are expected values, both of which are expected values.

Suppose we use an a-b coding scheme for the discriminator, where A and B are labels of false data and real data respectively.

C represents the value of error data predicted by G and believed by D..

Specific advantages of least square method:

1. The decision boundary is fixed (the discriminator parameters are fixed), and the generated samples are close to the decision boundary and closer to the real data.

2. Punishing the samples far away from the decision boundary can generate more gradients when updating the generator, thus alleviating the problem of gradient disappearance (gradient disappearance: the learning rate of the former hidden layer is lower than that of the latter hidden layer, that is, with the increase of the number of hidden layers, the classification accuracy decreases).

In GAN; The minimization equation 1 produces a minimization of Zhan Sen-Shannon divergence:

LSGAN: discuss the relationship between LSGAN and f divergence.

Explanation of formula: (a-b coding proves the following conditions of A, B and C)

will

join

It does not change the optimal value, because no parameter containing g is introduced.

In this way, we can deduce the best discriminator under the condition of fixed g:

Pdata is expressed by pd, and Equation 4 is re-expressed.

I won't prove it in detail here.

Simplify it to:

If b-c = 1 and b-a = 2, then

This is Pearson divergence. In short, it can be proved that if A, B and C satisfy the conditions of b-c = 1 and b-a = 2, minimizing Equation 4 will minimize the Pearson χ2 divergence between pd+pg and 2pg.

Adopt a-b coding scheme:

From the above proof, we can set a = 1, b = 1, and c = 0.

Adopt 0- 1 binary coding scheme:

These two formulas are close, but here, the author uses a-b coding to realize the experiment:

Bring one of the experiments:

References: Mao Xiaodong, Li Qing, Xie Hairen, et al. Least square generation.

Countermeasure Network [C]//20 17 IEEE Proceedings

International Conference on Computer Vision, Venice, 10.

22- 29, 20 17. Washington: IEEE Computer Society, 20 17:

28 13-282 1.