Current location - Education and Training Encyclopedia - Graduation thesis - Neural collaborative filtering (neural collaborative filtering)
Neural collaborative filtering (neural collaborative filtering)
This paper mainly discusses the collaborative filtering solution of implicit feedback. First, it defines two concepts: explicit feedback and implicit feedback:

Explicit feedback behavior includes the behavior that users clearly express their preference for items.

Implicit feedback behavior refers to those behaviors that cannot clearly reflect user preferences.

For example:

In many application scenarios, there is no clear feedback. Because most users are silent users, they will not give the system a clear feedback "What is my preference value for this item". Therefore, the recommendation system can infer the user's preference value according to a large number of implicit feedback.

According to the obtained implicit feedback data, we define the user-project interaction matrix y as:

However, the Yui of 1 only means that there is an interaction record between them, which doesn't mean that the user U really likes the item I. Similarly, the absence of an interaction record between U and I doesn't mean that U doesn't like I ... which challenges the learning of implicit feedback because it provides a noise signal about the user's preference. Although the observed items at least reflect the user's interest in the project, the items that have not been viewed may only be missing data, and naturally there is sparse negative feedback.

The recommendation problem in implicit feedback can be expressed as the problem of estimating the score of unobserved items in matrix Y (the score is used to evaluate the ranking of items). Formally, it can be abstracted as a learning function:

In order to deal with missing data, there are two common methods: either take all unobserved items as negative feedback, or take samples from unobserved items as examples of negative feedback.

The traditional solution method is Matrix Factorization (MF), which finds a hidden vector for each user and project. The question becomes:

K here represents the dimension of potential space. As we can see, MF model is a two-way interaction between users and potential factors of the project. It assumes that each dimension of the potential space is independent of each other and linearly combines them with the same weight. Therefore, MF can be regarded as a linear model of potential factors.

Examples are given to illustrate the limitations of this algorithm:

1(a) is the user-object interaction matrix, and 1(b) is the user's hidden space. This paper emphasizes two points to understand this picture:

1)MF distributes users and items in the same implicit space, so the similarity between two users can also be determined by the vector angle between them in the implicit space.

2) Jaccard coefficient is used as the similarity of real users.

The similarity between MF calculation and Jaccard coefficient calculation can also be used to judge the performance of MF. Let's look at Jaccard coefficient first.

The above example illustrates the possible limitations of MF because it uses a simple fixed inner product to estimate the complex interaction between users and projects in a low-dimensional potential space. One way to solve this problem is to use a large number of potential factors k (that is, the dimension of implicit space vectors). However, this may adversely affect the generalization ability of the model (for example, over-fitting of data), especially on sparse sets. This paper breaks through this limitation by using DNNs to learn interactive functions from data.

Firstly, this paper puts forward an overall framework:

In view of this general framework, this paper puts forward three different implementation methods, which can be illustrated by a picture:

GMF:

The above figure only uses GMF layer, and the first implementation mode GMF is obtained. GMF is called generalized matrix decomposition, and the calculation formula of output layer is:

MLP:

The above picture only uses the MLP layer on the right, and the second learning method is obtained. Learn the hidden vectors of users and projects through multi-layer neural networks. In this way, the calculation formula of the output layer is:

NeuMF:

Combining GMF and MLP, the third implementation mode is obtained. The above figure is the complete realization of this mode, and the calculation formula of the output layer is:

The experiment in this paper is used to answer the following research questions:

RQ 1 Is our NCF method better than the most advanced implicit collaborative filtering method?

RQ2 How does our optimization framework (logarithmic loss of negative sample sampling) serve the recommendation task?

Does the deeper hidden unit of RQ3 contribute to the learning of user project interaction data?

Next, introduce the experimental setup first, and then answer the above three questions.

The data set uses two public data sets: MovieLens and Pinterest. The table 1 summarizes their characteristics.

1.movielelens: This movie grading data set is widely used to evaluate collaborative filtering algorithms. The paper uses the version of1100,000, and each user has at least 20 ratings. Although this is an explicit feedback data set, this paper deliberately chooses it to mine (model) the performance of learning implicit signals from explicit feedback. Therefore, this paper transforms it into implicit data, in which each item is marked as 0 or 1 to indicate whether the user has scored the item.

2.Pinterest: This kind of implicit feedback data is built to evaluate content-based image recommendation. The raw data is very large, but sparse. For example, more than 20% of users have only one pin(pin is similar to likes), which makes it difficult to evaluate collaborative filtering algorithms. Therefore, this paper filters the data set in the same way as the MovieLens data set: only users with at least 20 pin are reserved. After processing, a data subset containing 55, 187 users and 1 580,809 project interactions was obtained. Each interaction indicates whether the user pins the image on his homepage.

Evaluation scheme: In order to evaluate the performance of project recommendation, this paper adopts the leave one option method widely used in the literature. That is, for each user, the last interaction is used as the test set (the data set is usually time stamped), and the rest of the training is used as the training set. Because it takes too much time to arrange all the items for each user in the evaluation process, we follow the general strategy and randomly select 100 items that do not interact with users, and arrange the test items in this 100 item. The performance of the leaderboard is measured by click-through rate (HR) and normalized discount cumulative income (NDCG). Unless otherwise specified, the ranking list of these two indicators is truncated to 10. In this way, HR intuitively measures whether the test item exists in the top 10 list, while NDCG calculates the hit position by assigning a higher score as the highest ranking. This paper calculates these two indicators of each test user and gets the average score.

? ItemPop. Judging the popularity of the project according to the number of interactions, thus ranking the projects. This is an objective method based on evaluating recommendation performance.

? ItemKNN. This is a standard collaborative filtering method based on project.

? Business process redesign. This method uses a formula to optimize MF model, which has pairwise ranking loss, and BPR adjusts it so that it can learn from implicit feedback. It is a strong competitor of the project recommendation benchmark. This paper uses a fixed learning rate, changes it and reports its best performance.

? EALS. This is the most advanced MF method recommended by the project.

HR (hit rate) and NDCG (normalized cumulative loss gain). HR intuitively measures whether the test item exists in the top list of 10, while NDCG calculates the hit position by specifying a higher score as the top ranking, focusing on NDCG.

The name NDCG may be scary, but the idea behind it is simple. The recommendation system returns some items and forms a list. We need to calculate how good this list is. Each item has an associated score. Usually these scores are non-negative, that is, gains. In addition, we usually set the gain of these items to 0 without user feedback.

storage gain

We add up these gains to be the cumulative gain (CG), and CG is the score of the whole recommendation list after the branches of the relevance of each recommendation result are accumulated.

Rel i represents the relevance of the recommendation results at location I, and k represents the size of the recommendation list to be investigated.

Cumulative loss gain (DCG)

One disadvantage of CG is that it does not take into account the influence of each recommendation result in different positions on the whole recommendation effect. For example, we always hope that the results with high correlation rank first. Obviously, if the results with low relevance are ranked first, it will seriously affect the user's experience, so we introduce the location influence factor, DCG(Discounted Cumulative Gain), on the basis of CG, which means to "discount" the recommendation effect of the recommended results with low ranking. Assume that the later the sort, the lower the value. At the ith position, its value is 1/log 2 (i+ 1), so the income of the ith result is rel i * 1/log 2 (i+ 1), so:

Two conclusions can be drawn from the above formula:

1. The greater the relevance of the recommendation results, the greater the DCG.

2. If the correlation is good, the better the recommendation effect, the greater the DCG.

Standardized cumulative loss gain (NDCG)

DCG still has some shortcomings, that is, it is difficult to evaluate horizontally between different recommendation lists, and it is impossible for us to evaluate a recommendation system only by using the recommendation list of one user and its corresponding results, but to evaluate users and their recommendation list results in the whole test machine. Then it is necessary to normalize the evaluation scores of different user recommendation lists, that is, NDCG (normalized cumulative gain).

Before introducing NDCG, we need to know another concept, IDCG (ideal DCG), which refers to the list of the best recommendation results returned by the recommendation system for a user, that is, assuming that the returned results are sorted according to relevance, and the most relevant results are placed first, the DCG of this sequence is IDCG. Therefore, the value of DCG is between (0, IDCG), so the value of NDCG is between (0, 1). NDCG calculation formula:

IDCG is the maximum DCG value under ideal conditions.

Where |REL| means that the results are arranged in descending order of relevance, and the set consisting of the top p results is taken. That is, the results are sorted in an optimal way.

Practical example

Assuming that the five results returned by the recommendation system, the model scores are 1.2, 0.7, 0. 1, 0.2 and 4.0 respectively.

We first calculated the DCG value as 2.39278 by Formula 4.2, and the iDCG value as 3.6309 by Formula 4.4.

Finally, the NDCG is calculated as 65% by Formula 4.3.

For more recommended model evaluation methods, please refer to https://statusrank.coding.me/articles/639f7364.html.

4 (Figure 4 shows the performance of HR@ 10 and NDCG@ 10 relative to the number of predictors.

5 (Figure 5 shows the performance of the Top-K recommendation list, and the ranking position K ranges from 1 to 10.

Generally speaking, the NeuMF model proposed in this paper (combining GMF and MLP) is better than other methods.

Fig. 6 shows the training effect when the model is regarded as a binary classification task and logloss is used as a loss function.

Figure 7 shows the influence of sampling rate on model performance (the horizontal axis is sampling rate, that is, the ratio of negative samples to positive samples).

The above table sets two variables, namely the embedding length k and the number of neural network layers, and displays the results on two data sets in a way similar to grid search. Increasing the embedding length and neural network layers can improve the training effect.