Current location - Education and Training Encyclopedia - Graduation thesis - The principle of BP neural network algorithm is explained in simple terms.
The principle of BP neural network algorithm is explained in simple terms.
The principle of BP neural network algorithm is explained in simple terms.

I believe that everyone will encounter problems with BP algorithm when they first come into contact with neural networks. How to understand BP neural network vividly and quickly is an advanced pleasure for us to learn (voiceover: fun? Are you talking to me about fun? )

This blog post simply and rudely helps you get started with BP neural network quickly.

What is the definition of BP neural network? Look at this sentence: a multi-layer feedforward network trained by "error back propagation"

The idea of BP is to use the output error to estimate the error of the upper layer of the output layer, and then use the error of this layer to estimate the error of the upper layer, so as to get all the error estimates of each layer. The error estimation here can be understood as a partial derivative, and we adjust the connection weight of each layer accordingly, and then recalculate the output error with the adjusted connection weight. Until the output error meets the requirements or the number of iterations overflows the set value.

After all, the word "error" has been said a lot, which shows that this algorithm has a lot to do with error?

Yes, the propagation object of BP is "error", and the purpose of propagation is to get the estimation error of each layer.

The learning rule is: using the steepest descent method, the weights and thresholds of the network are constantly adjusted through reverse propagation (that is, layer by layer), and finally the global error coefficient is minimized.

Its learning essence is: the dynamic adjustment of each connection weight.

The topology is as shown above: input layer, hidden layer and output layer.

The advantage of BP network is that it can learn and store a large number of input-output relationships without pointing out this mathematical relationship in advance. So how does it learn?

BP uses an activation function that can be derived everywhere to describe the input-output relationship of this layer, and often uses S-type function δ as the activation function.

We now begin to supervise the BP neural network learning algorithm:

1, and the output layer error e is obtained by forward propagation.

=> Input layer Input samples => Every hidden layer => Output layer

2, to determine whether the reverse propagation.

=> If the error of the output layer does not match the expected value => Back propagation

3. Error back propagation

=> Errors are displayed at all levels => Correct the weight of each layer until the error is reduced to an acceptable level.

The algorithm is relatively simple to explain, and then the true face of BP can be known through mathematical formulas.

Suppose our network structure is an input layer with n neurons, a hidden layer with p neurons and an output layer with q neurons.

These variables are as follows:

After knowing the above variables, start to calculate:

1. Initialize an error function with a random number within (-1, 1), and set the precision ε and the maximum number of iterations m..

2. Randomly select the kth input sample and the corresponding expected output.

Repeat the following steps until the error meets the requirements:

3. Calculate the input and output of each neuron in the hidden layer.

4. Calculate the partial derivative of the error function E to each neuron in the output layer, and calculate it according to the expected output and actual output of the output layer, the input of the output layer and other parameters.

5. Calculate the partial derivative of the error function to each neuron in the hidden layer, and calculate it according to the sensitivity (sensitivity will be introduced later) δo(k) of the latter layer (here, the output layer), the connection weight w of the latter layer, and the input value of this layer.

Sixth, the partial derivative in the fourth step is used to correct the connection weight of the output layer.

7. Use the partial derivative in the fifth step to correct the connection weight of the hidden layer.

Eight, calculate the global error (m samples, q categories)

This paper introduces the concrete calculation method, and then summarizes the process with a simple mathematical formula. I believe that after reading the detailed steps above, I will have some understanding and comprehension.

Suppose our neural network is like this, there are two hidden layers at this time.

Let's first understand what sensitivity is.

Please look at the following formula:

This formula is the partial derivative of error to b, what is this b? It is a cardinal number, and the sensitivity δ is the rate of change of the error to the cardinal number, that is, the derivative.

Because? u/? B= 1, so? E/? b=? E/? U=δ, that is to say, the sensitivity of deviation basis? E/? B =δ is equal to the derivative of error e to all inputs u of a node? E/? u .

It can also be considered that the sensitivity here is equal to the derivative of the error e to the input of this layer. Note that the input here is the input of U layer in the above figure, that is, the input after the calculation of layer and layer weight is completed.

The sensitivity of the first layer of each hidden layer is:

"?" The multiplication of each element here can be compared with the detailed formula above if you don't understand it.

The sensitivity calculation method of the output layer is different, as shown below:

The final correction weight is the sensitivity multiplied by the input value of the layer. Note that the input here is not multiplied by the weight, that is, the Xi level in the above figure.

For each weight (W)ij, there is a specific learning rate ηIj, which is learned by the algorithm.