Thesis Title: Neurogram Collaborative Filtering.
Address: https://arxiv.org/pdf/1905.08108.pdf.
This paper is about collaborative filtering algorithm of graph structure. In the original methods based on matrix decomposition and deep learning, users (or projects) are usually embedded by mapping existing features (such as ID and attributes). Therefore, the embedding of users and items is used for collaborative recall. However, the author thinks that the inherent disadvantage of this method is that the cooperative signal hidden in the interaction data between users and objects is not encoded in the embedding process. In this way, the resulting embedding may not be enough to capture the collaborative filtering effect.
Let's take a look at how this paper uses the potential cooperation signals in the data.
Recommendation algorithms are widely used in various fields and play a vital role in e-commerce, social media, advertising and other fields. The core content of recommendation system is to evaluate users' liking for an item according to their previous buying and clicking behaviors, so as to make personalized recommendations for each user. Collaborative filtering algorithm thinks that users with similar historical behaviors have the same interests, so it recommends the interests of users of the same type, that is, UserCF, while ItemCF recommends items with similar historical behaviors to users.
Traditional collaborative filtering methods are either based on matrix decomposition or deep learning, both of which ignore a very critical information-the collaborative signal of the interaction between users and items, which is hidden in the interaction between users and items. The original collaborative filtering method ignores this information, so it is not enough to only embed the user and project representation.
In this paper, a new recommendation framework, Neural Network Collaborative Filtering (NGCF), is developed by integrating user item interaction (more specifically, bipartite graph structure) into the embedding process. This framework makes use of this structure by extending the embedding on the user item graph structure. This method models the expression of high-order connectivity in the user project graph, so as to effectively inject the cooperation signal into the embedding process in an explicit way.
Before introducing the model, let's explain what useritem is. Interaction and what are advanced user items? Interaction.
Look at the picture on the left first. This picture is the useritem interaction. U 1 is the user we want to recommend, which is indicated by double circles. The items he has interacted with are i 1, i2 and i3. Look at the tree structure diagram on the right. This diagram is a high-order interaction diagram of u 1. Note that only L >;; 1 is a high-order connection of u 1. It is observed that such a path, u 1 ← i2 ← u2, represents the behavioral similarity between u 1 and u2, because both users have interacted with i2. On the other hand, the longer path u 1←i2←u2←i4 implies that u 1 may click i4, because his similar user u2 has bought i4 before. On the other hand, at the level of l = 3, user u 1 will prefer i4 to i5, because there are two paths from i4 to u 1 and only one path from i5.
Of course, it is impossible to express this tree structure by constructing real tree nodes, because the tree model is complex and the structure is very huge, so it is impossible to build a tree for each user, and the workload is too great. So how to design the model structure to achieve this high-order connectivity effect? This will be applied to neural networks. The embedded propagation layer is designed to represent the transmission between each layer.
Take the above picture as an example. Stacking two layers can capture the behavior similarity of u 1←i2←u2, and stacking three layers can capture the potential recommendation and information flow intensity of u 1←i2←u2←i4 (through the evaluation of the trainable weights between layers), and determine the recommendation of i4 and i5.
This is the same as the traditional embedding, which embeds the original userID and itemID. Different from traditional embedding, in our NGCF framework, we optimize embedding by expanding embedding on the user-project interaction diagram. Because the embedding optimization step explicitly injects the cooperation signal into the embedding, it can provide more effective embedding for the recommendation.
This layer is the core content of this paper. Let's read it in detail.
Intuitively, the items that users have interacted with will bring the most direct basis to users' preferences. Similarly, users who have interacted with an item can be regarded as the characteristics of the item and can be used to measure the collaboration similarity between the two items. On this basis, we realize the embedded propagation between connecting users and projects, and realize it through two main operations: message construction and message aggregation.
Message construction (message construction)
For the connected user-item pair (u, i), we define the message from i to u as:
Where ei is the embedding of i, eu is the embedding of u, pui is the attenuation factor that controls each propagation, function f is the message constructor, and f is defined as:
Among them, W 1 and W2 are used to extract useful embedded information. We can see the direct interaction between I and U controlled by W2, which makes messages depend on the affinity between ei and eu, for example, more messages are transmitted from similar items.
Another important place is Nu and Ni, pui = 1/. Nu and Ni represent the first-hop neighbors of users U and item i, and pui reflects the contribution of historical items to user preferences from the perspective of representation learning. From the point of view of message delivery, pui can be interpreted as a discount factor, considering that the message being propagated should be attenuated with the length of the path.
Message aggregation
The polymerization method is as follows:
In which a representation of user u obtained after the first embedded propagation layer is shown. The activation function uses leakyrelu, which is suitable for encoding pos and neg signals.
Another important message is that it is defined as follows:
The main function of this information is to retain the original feature information.
So far, we have got the same method, and we can also get it. This is all the information of first-order connectivity.
According to the previous calculation method, if we stack multiple embedded propagation layers, we can get high-order connection information:
The calculation method is as follows:
Seeing this, I have great doubts in my heart. We need the information of L layer-1 when calculating the eu and ei of L layer, so how do we know whether ei and eu exist in L layer? That is to say, when the total number of layers on the U side L is greater than that on the I side, how to calculate the E of the L layer according to the e I of l- 1 layer? After thinking, the sensory training sample should be a path, that is, this example is the path u 1 ← i2 ← u2 ← i4, so it can be guaranteed that the number of layers l of u 1 is the same as that of i4, so there is no problem that the number of layers on it does not match.
Ps: I didn't know that L was fixed until I saw the experimental results, so each layer would not be missing.
Another is that W is different between different layers, and each layer has its own parameters. We can see this from the formula, because we need different W to extract the information of different layers.
Another question is whether pui is the same for every L layer. The formula here seems to refer to the calculation results of Nu and Ni in the first jump.
This part is the mathematical process derived from matrix operation during batch training. In fact, it is exactly the same as the mathematical calculation of the process we talked about before. Imagine how to carry out such a complicated interactive operation during training without matrix operation.
After embedding propagation in the L layer, we have L eu and L ei, and we connect them:
This can not only enrich the initial embedding by embedding the propagation layer, but also control the propagation range by adjusting L.
Finally, we calculate the inner product to evaluate the user's preference for the target product:
Bpr loss in paired mode: