As you know, a feedforward (fully connected) neural network is shown in the following figure, which consists of input layer, hidden layer and output layer. If you want to learn feedforward neural networks from scratch, then references 1 and 2 are the best tutorials! Now, assuming that you have mastered the basic feedforward neural network, the following picture may be slightly different from what you see in general textbooks. Yes, it is obtained by rotating the usual drawing method 90 degrees counterclockwise. The main purpose of this is to seamlessly connect with the subsequent RNN diagram.
Just like the human brain, when we see (receive) an image of a bear, we go through a series of hidden layers (and its extremely complicated calculation process), and finally get the result of "bear" in the output layer.
But the process reflected (or completed) by feedforward neural network is often relatively static. The human brain also has the ability to process time series data. In other words, a result obtained by the brain is not only related to the current input, but also related to the output at the last moment.
For example, we saw an article in which there was a sentence: "The wolf quietly hid in front of Pleasant Goat's house, waiting for its prey to take the bait." Obviously, when we read the word "prey", you will immediately reflect that it should refer to Pleasant Goat, while the wolf is waiting for its prey. Obviously, this level of understanding can't be achieved only by current input (such as current sentences or vocabulary). The output of the last stage (or instant) (that is, the understanding of the last sentence) is also very important.
Therefore, from one moment to the next, our brain operates as a function: it accepts inputs from our senses (external) and thoughts (internal) and produces outputs in the form of actions (external) and new ideas (internal).
Our brains run repeatedly in time. We see a bear, and then think of "bear", and then think of "running". It is important that the function of transforming the image of a bear into the thought of "bear" also transforms the thought of "bear" into the thought of "running". This is a recursive function, and we can use recurrent neural network (RNN) to model it.
RNN is composed of the same feedforward neural network, and a feedforward neural network is used for a moment or a time step, which we call "RNN cell".
In the above figure, a block of neural network A looks at an input xt and outputs a value ht. Loops allow information to pass from one step of the network to the next.
A large block of NN here is actually equivalent to the hidden layer in feedforward neural network, that is, the NN structure of multi-layer transmission. And the loop input is actually input to the next state, so the above figure actually implies that a delay is needed.
If you are confused about the "input → output" structure of the above-mentioned cycle, if you are still puzzled about this self-cycle, in fact, you only need to "pull open" this structure, and it will be clear at a glance. Recursive neural networks can be considered as multiple copies of the same network, and each copy delivers messages to successors. Consider what will happen if we expand the loop:
or
Another common diagram is as follows (the time expansion of the calculation involved in Arnn and its forward calculation), but they are all the same routine.
Or expressed as the following circulating neural network that expands with time:
Cyclic neural network can be applied to many different types of machine learning tasks. According to the characteristics of these tasks, they can be divided into the following modes:
Sequence-to-category mode: This mode is a classification problem of sequence data. The input is a sequence and the output is a category. For example, in text classification, the input data is a sequence of words and the output is the category of text.
Sequence-to-sequence mode: The input and output of this task are both sequences. Specifically, it can be divided into two situations:
Synchronous sequence-to-sequence mode: this mode is the sequence labeling task in machine learning, that is, there are inputs and outputs at every moment, and the length of input sequence and output sequence is the same. For example, in part-of-speech tagging, each word needs to be tagged with its corresponding part-of-speech tag.
Asynchronous sequence-to-sequence mode: this mode is also called EncoderDecoder mode, that is, input and output do not need to have a strict correspondence and do not need to keep the same length. For example, in machine translation, word sequences in the source language are input and word sequences in the target language are output.
First of all, let's look at the application mode of sequence to category. Suppose a sample X={x 1,? , xT} is a sequence of length t, and the output is a category y∈{ 1,? C}. We can input the sample X into the recurrent neural network at different times and get the hidden state h 1 at different times. , hT. We can put ht; As the final representation (or feature) of the whole sequence, and input it to the classifier g (),
Here g (...) can be a simple linear classifier (such as Logistic regression) or a complex classifier (such as multilayer feedforward neural network).
In addition to representing the hidden state of the last moment as a sequence (as shown in Figure A), we can also average all the hidden states of the whole sequence and take this average state as the representation of the whole sequence (as shown in Figure B).
In the synchronous sequence-to-sequence mode (as shown in the figure below), the input is a length of t X={x 1,? , xT}, the output is the sequence Y={y 1,? So far this year, the hidden state h 1 is obtained by inputting the sample X into the recurrent neural network at different times. The hidden state hT of each moment represents the information of the current moment and history, and the label of the current moment is obtained by inputting it into the classifier G ().
In the asynchronous sequence-to-sequence mode (as shown in the figure below), the input is a length of t X={x 1,? , xT}, the output is the sequence Y={y 1,? , yM}. Input the sample X into the cyclic neural network at different times, and get the hidden state h 1 at different times. , hT. Then the RNN unit is executed m times. At this time, the input of each RNN cell is only the hidden state at the last moment. For the hidden state hT at each moment, t∈[T+ 1, T+M] is input into the classifier G () to get the label of the current moment.
refer to
1 Zuo Fei, r language practice: machine learning and data analysis, electronic industry press, chapter 15.
2 Michael Nelson
3 Qiu xipeng, circulatory neural network, neural network and deep learning?
4 DennyBritz, Recursive Neural Network Course, Part 1 Introduction to RNNs
5 Christopher Olah,understanding glstm Networks(/baimafujinji/article/details/78279746