Current location - Education and Training Encyclopedia - Graduation thesis - Recommendation System Paper Reading (35)—— Amazon: A Recall Algorithm for Complementary and Diversified Commodities
Recommendation System Paper Reading (35)—— Amazon: A Recall Algorithm for Complementary and Diversified Commodities
Thesis:

Thesis title: P-Companion: Principle Framework of Diversification? Complementary products? Recommended "

Address: https://dl.acm.org/doi/pdf/10.145/3340531.3412732.

In the last article, I have mentioned the relationship between substitutability and complementarity of goods. This article of Amazon focuses on how to make good use of the complementary relationship between commodities while maintaining diversity.

Complementary product recommendation (cardiopulmonary resuscitation) is an important part of e-commerce service, which aims to provide recommendations for goods that are often purchased together to meet people's needs. However, the existing methods are far from optimal. For a product, how to recommend different types of complementary products is the key problem to be solved in this work.

As mentioned in our previous article 34, we think that * * * and the purchased products are complementary, but the article here suggests that the jointly purchased products are not necessarily complementary. Let's give a simple example. If the user buys two lipsticks of different brands, of course, these two lipsticks are not complementary products, but they can be said to be similar products. If the user buys a mobile phone and a mobile phone case, then the mobile phone case is the supporting product of the mobile phone, and vice versa, because the user will not find a supporting mobile phone because he bought the mobile phone case first.

More specifically, look at the following example:

In figure 1, we show a comparative example, which illustrates the requirements for generating high-quality "buy together" recommendations. Taking tennis rackets as "query products", we compared three groups of recommendation lists. The list 1 contains three other similar tennis rackets. Listing 2 contains three tennis balls, and Listing 3 contains a tennis ball, a racket case and a headband. Of course, we think that the list 1 is usually more inclined to substitute products and is unlikely to be purchased together in the list 1. Although both Listing 2 and Listing 3 can be regarded as reasonable recommendations, we think Listing 3 is a better choice, because it puts forward three different types of products to better meet the needs of customers for tennis. This example shows that the ideal recommended solution for supplementary production should consider both relevance and diversity to meet the needs of customers.

In the previous work, we would prefer to model the similarity between commodities, such as collaborative filtering or some item2vec methods, but modeling the similarity between commodities encountered the following challenges:

C 1: The complementary relationship is asymmetric, and the complementary recommendation is not only based on similarity measurement. For example, a tennis racket and a headband are not at all similar in character or image characteristics. Moreover, the SD card can be a supplementary product of the camera, not the other way around. These facts exclude most methods based on similarity, and different mechanisms are needed to establish the model of complementary relationship. ?

C2: Supplementary recommendation needs to consider diversity. These recommendations are usually a group of goods with different categories and functions, which can meet the needs of customers. As shown in figure 1, a diversified recommendation list containing three types of tennis-related products is better than only one type of recommendation list.

C3: Supplementary suggestions were frustrated in the cold start project. In other words, in the field of e-commerce, similarity recommendation is difficult to solve the cold start problem.

With these challenges, let's see how Amazon solves these problems.

I: collection of items

B ∈ I × I, which represents three relationships between commodity pairs collected from customers' historical behaviors (i.e. * * * has a purchase, * * * has a browse and a purchase after browsing).

c? Represents the category characteristics of commodity I (for example, product category, type, title and description)

? , representing the product type, that is, representing the functional characteristics of the product itself.

The problem of recommending supplementary goods is expressed as follows:

Given the characteristics of commodity category c (including title, commodity type, etc. ) and user behavior data b as input, we want to learn recommendation model M, which can be used to query item I, item category and diversity? Under certain conditions, will M predict first? Different supplementary commodity types {}, and then generated according to each predicted supplementary commodity type? Product set {0}.

In the comparison of different combinations of browsing with * * *, purchasing with * * and purchasing after browsing, we observed that only the product pairs included in the purchase record with * * * scored the highest on MTurk, which was 30% higher than the unprocessed signal pairs purchased with * * *, so Amazon used this data for training.

Let's look at the definitions of some symbols in the model:

Model diagram:

The method of GAT is adopted here. See this paper of GAT for details, so I won't introduce it in detail here.

Where FFN is the feedforward neural network, Z is the score obtained by attention, Ni is the neighbor node, which is the positive sample and negative sample during training, and Y is defined as:

Wherein the positive sample y is defined as:

Y of negative sample is defined as:

The calculation method of is:

f(。 ) is a function to measure learning. It is to separate my distance, not the edge. The purpose of the above optimization is to force the distance between and to be less than? At the same time, the distance between jean and jean is at least? + ? .

The loss function is actually hinge loss.

This is where we talk about how to solve the diversity problem, mainly according to the query item and the type of this item, to generate multiple types related to this item.

Given the query item I and the candidate item J, we have a type pair {} and a label between them. This paper uses the encoder-decoder model to model:

Firstly, the category w of project I is mapped to, and then it is learned through metric learning.

For other symbol definitions, please refer to the above table, and do not repeat them here. The main goal of this optimization is to make the category embedding between jointly purchased goods more similar.

This part is the most critical part of the whole paper, which not only defines how to recommend supplementary goods through measurement learning, but also defines learning under various conditions.

First, we select several category embedments similar to those of the query item, and then we map the embedding of the item itself to the corresponding category.

Here is the same as the previous measurement learning method, except that what we want to optimize here is the measurement of the distance between the embedded project and the candidate project J.

This part mainly weights two metric optimization functions in 3.2 and 3.3. The weighting coefficient is

This part mainly talks about the setting of several super parameters, such as L=64, d= 128 and so on.

Moreover, we need to know how to predict after the end-2-end learning of the model, that is, how to generate candidates, which we have already said when defining the problem. From an intuitive point of view, it is to divide Mr. Wang into top-k closest categories, and then use the method in 3.3 to generate candidates in each category. Specifically:

We have it now.

You can recommend it and find something closest to this vector.

Each category adopts the same method, so that we can generate a diversified recommendation list containing multiple categories.