Current location - Education and Training Encyclopedia - Graduation thesis - Memory network-memory network
Memory network-memory network
Memory network is a small branch of deep learning. Since 20 14 was put forward, several mature models have been gradually developed. I am more concerned about its application and development in QA field. This column will select several papers to systematically introduce the related models and applications of memory networks. Some papers will also be introduced with the implementation of TensorFlow.

The first paper to be introduced is the paper Memory Networks published by FaceBook in 20 14, and the "Neural Turing Machine" proposed in the same period also adopts a similar method (external memory). Traditional deep learning models (RNN, LSTM, GRU, etc. ) They use the hidden state or attention mechanism as their memory function, but the memory generated by this method is too small to accurately record all the contents expressed in a paragraph, that is, a lot of information will be lost when the input is encoded into dense vectors. Therefore, this paper puts forward a readable and writable external storage module, and trains it with reasoning components, and finally gets a flexible storage module. Next, let's take a look at its framework:

First of all, the model mainly includes a series of memory cells (which can be regarded as an array, and each cell holds the memory of a sentence) and four modules, I, G, O and R. The structure diagram is as follows:

Simply put, the input text is encoded into a vector by the input module, and then used as the input of the generalization module, which reads and writes the memory according to the input vector, that is, updates the memory. Then the output module will weight the contents of the memory according to the question (which will also be coded by the input module), and combine the memories according to the degree of correlation with the question to get the output vector. Finally, the response module will generate natural language answers according to the output vector coding. The functions of each module are as follows:

Next, let's take a look at the implementation details of the basic model:

According to the description of the paper, I of the basic model is a simple embedded search operation, that is, the original text is converted into the form of a word vector, while the G module directly stores the input vector in the next position of the memory array, without doing other operations, that is, directly writes the new memory without modifying the old memory. The main work is carried out in O and R modules. The O module selects the topk-related memory from all the memories according to the input problem vector. The specific selection method is as follows: First, select the most relevant memory in the memory:

Next, according to the selected o 1 and input x, select the memory o2 most related to them:

For the above formula, if both x and o 1 are represented by linear vectors (BOW et al. ), can be divided into the following sum, otherwise can't.

Just select the topk memory slot that is most relevant to the problem. As the input of R module, the final answer is generated. In fact, it is also very simple to calculate the correlation between all candidate words and R input with the same scoring function as above, and the word with the highest score can be output as the correct answer:

The scoring function used many times above can meet the following forms:

In the final model, the loss function selects the margin ranking loss, that is, the score of the correct answer is required to be at least one margin r higher than the score of the wrong answer. The formula is as follows:

The wrong answers here are randomly selected from the samples, instead of counting all the wrong samples. Let's give a simple example to illustrate the above process:

For the first question: where is the milk now? The output module will grade all the memories (actually the input sentences) and get "Joe left the milk." The highest score, that is, the most relevant to the question, and then score the remaining memories to find out where the milk is now. The most relevant memory for Joe is leaving milk. We found it was "Joe's trip to the office". In this way, you can find the most relevant memory, and then use the R module to score all the words and find the word with the highest score as the answer.

The key point of this paper is that he proposed a general model framework (memory network), but many parts are not perfect. Some problems such as word input, large memory and new words are also expounded in the back of the paper, but I think it is more useful to read other people's published papers on this basis. So I won't introduce it in detail here.