Current location - Education and Training Encyclopedia - Graduation thesis - Recommendation System Paper Reading (10) —— Sequence Recommendation Algorithm Based on Neural Network
Recommendation System Paper Reading (10) —— Sequence Recommendation Algorithm Based on Neural Network
Thesis:

Address: Education and Graphic Neural Network Sr-gnn.

github: /CRIPAC-DIG/SR-GNN

Session-based recommendation is generally to model sequential sessions, encode the whole session into a hidden vector, and then use this hidden vector to predict the next click. However, this method does not take into account the direct and complex transition of items, that is, in the clicked session, there are complex node pointing relationships in the directed graph besides the time sequence between items, so the previous methods are not enough to model the clicked sequence well.

The existing session-based recommendation methods mainly focus on circular neural network and Markov chain. This paper puts forward two shortcomings of the existing methods:

1) When the number of user behaviors in a session is very limited, it is difficult for these methods to obtain accurate user behavior representation. For example, when using RNN model, the expression of user behavior is the output of the last unit, which is not very accurate.

2) According to the previous work, the transfer mode between items is a very important feature in conversation recommendation, but RNN and Markov process only model the one-way transfer relationship between two adjacent items, ignoring other items in the conversation.

In order to overcome the above shortcomings, this paper puts forward a method of modeling user session by using graphical neural network:

The following describes in detail how to recommend the drawing sequence.

V = {v 1, v2...vm} is all items, and S = {} is the items clicked in a session in chronological order. The goal of this paper is to predict the next item vs, n+ 1 that users will click on. The task of the model is to output the prediction probabilities of all projects and select top-k for recommendation.

We construct a subgraph for each session and obtain its corresponding out-of-degree matrix and in-degree matrix.

Suppose a click sequence is v1->; v2-& gt; v4-& gt; V3, then the subgraph it gets is shown in the red part of the figure below:

For another example, a click sequence is v1->; v2-& gt; v3-& gt; v2-& gt; V4, then its subgraph is as follows:

At the same time, we will construct an out-of-degree and in-degree matrix for each subgraph, and normalize each row of the out-of-degree and in-degree matrix, such as our sequence v1->; v2-& gt; v3-& gt; v2-& gt; The matrix corresponding to v4 is as follows:

How are the values in this matrix calculated? Let's say:

Look at the degree matrix on the left. The first behavior is 0 1 000, which means v 1->V2, because v 1 has only one pointing item, so it is1; Look at the second line, 0 0 1/2 1/2. Because v2 has edges pointing to v3 and v4, each value becomes 1/2 after normalization. The calculation method of the in-degree matrix is the same, so I won't say it again.

In this paper, GRU unit is used for sequence modeling, and graphic information is embedded into neural network, so that GRU can fully learn the relationship between projects. Traditional GRU can only learn the relationship between two adjacent items, and after adding graph information, it can learn the information of the whole conversation subgraph.

The calculation formula is as follows:

In order to understand this calculation process, we still use the previous example: v1-> v2-& gt; v3-& gt; v2-& gt; V4 gradually analyzes the input-output process.

(1) is the time t, and the input corresponding to the I-th click in conversation S is the matrix of n2n, which is the complete matrix of the conversation subgraph, but one of the rows, that is, the row corresponding to item vi, has the size of 12n, and n represents the number of different items in the sequence.

According to the example, if I take 2, it is [001/21/21/201/20].

In addition, you can decompose: into [,]

(2) It can be understood as the embedding vector corresponding to the I-th item in the sequence during training. This vector changes constantly with the training of the model, which can be understood as the state of the hidden layer and is a D-dimensional vector.

(3)H is the weight vector of d*2d, and it can also be regarded as a block matrix, which can be understood as H=[Hin|Hout], and each block is a vector of d * d.

Let's look at the calculation process:

1)[ ...,], the result is a matrix of d * n, and the transposed matrix is a matrix of n*d, which is counted as

2): H is equivalent to [? ], that is, after disassembly, multiply and then splice, so the result is a vector of 1 * 2d.

The above is the complete calculation process of the first click input. It can be seen that before entering GRU calculation, graphic information is embedded in the neural network by multiplying with As, I matrix, which deepens the interactive information between the projects learned by the neural network.

The other is the calculation process of GRU, which is different from the original GRU in that the input is changed from xt to as, and I is embedded with graphic information.

General samples also have update gates and reset gates, and the calculation method is exactly the same as the original GRU.

In fact, the things here are equivalent to those in the original gru, but in SR-GNN, I has not changed during a round of operation, which is equivalent to each item entering GRU for separate calculation and getting its own vector, that is to say, it is constantly changing during the calculation of GRU, so it is easier to understand the source code:

Hidden in the formula, every step of gru's calculation will be updated. I have a question here. If the hiding of all items is updated, all items in the whole sequence will enter GRU in parallel for calculation, and each step will get its own vector. When the vector of each item is updated, the next step will be calculated according to the new calculation, and then the next step will be calculated.

The calculation process is roughly as follows:

There are four GRU parallel calculations, and their hidden states are not updated each time. All hidden and graphic information is considered as input.

As can be seen from the above figure, each item has to go through T step to get its own item-VEC, so after T step, we get the vectors of all items in the sequence, namely:

The vectors drawn in the blue box in the picture, with these vectors, how can we get the prediction results? This leads to the next question.

Observing the model structure above, we can see the degree of attention. Yes, we think that all the item-vec in the conversation have no influence on the prediction results. Some projects have great influence on the results, while others have little influence, so we make a weighted sum. At the same time, the paper thinks that session is very important for the last item-VEC, s 1=vn, so it is taken out separately:

Formula (6) is a simple attention operation. In fact, from the perspective of the formula, it is to calculate the weight of each vi and the last vector vn, and then add the weights.

In the final output layer, the inner product is calculated by using sh and embedding of each item, where vi should be a vector from the item embedding layer, not a hidden vector:

Finally, the click probability of each item is obtained through a softmax:

The loss function is a cross entropy loss function:

From the data point of view, SR-GNN surpasses the classic GRU4REC, which also shows that the embedding of graph information can bring better recommendation effect.

In this paper, graphic information is cleverly embedded into neural network, so that GRU can learn the relationship between each item more effectively, and is no longer limited to the learning between adjacent items. In recent years, the idea and method of graph neural network have been used repeatedly in recommendation system, and learning graph neural network well should be the next craze of recommendation system.