Address: cross module with automatic features. It is worth mentioning that in these models, the special crossover occurs at the bit level, that is, the characteristic crossover of the bit level. What is bit-by-bit feature crossover?
For example, feature 1 = (a, b, c) and feature 2 = (d, e, f).
Then the bitwise feature intersection between them is f(w 1*a*d, w2*b*e, w3*c*f). For example, in DCN:
It can be seen that in the crossover network, the intersection of feature vectors is a bit-by-bit intersection, that is, x and x0 of each level cross bit by bit.
Another feature crossing mode is vector crossing mode, which is expressed as f(w(a*d, b*e, c*f)). It can be seen that these two features are operated by the weight matrix after the inner product operation, so they are vector-like intersection.
Two definitions are also mentioned in the paper:
Explicit and implicit
Explicit feature interaction and implicit feature interaction. Taking xi and xj as examples, after a series of transformations, we can express them in the form of wij * (xi * xj), which can be considered as explicit feature interaction, otherwise it is implicit feature interaction.
As usual, the model first:
Look at picture c first. Xdeepfm sends the obtained vector concat to DNN through CIN for ctr estimation. This paper focuses on the whole CIN, whose full name is compressed interaction network. Let's introduce how CIN is done in detail.
In order to realize automatic learning explicit high-order feature interaction and make the interaction take place at the vector level, a new neural model-Compressed Interactive Network (CIN) is proposed. In CIN, the hidden vector is a unit object, so we organize the input original features and the hidden layer in the neural network into a matrix, which is denoted as X0 and Xk respectively. Each Xk is derived from the previous Xk- 1:
Ps: I didn't deduce this formula clearly during the interview. Let me talk about the calculation process of this formula first. .
CIN calculation is divided into two steps. The first step is to use and calculate.
Let's take a look at this picture:
To understand the calculation process, we must first know several concepts.
Inner product: (a, b, c). ( 1,2,3) = ( 1*a,2*b,3*c)
Outer product: (a, b, c)? . ( 1,2,3) =,
Then the calculation of z is (ignore W first, and then introduce the usage of W later)
We need to calculate three tangent planes along the d dimension.
Section 1: The outer product of (a, 1, d) and (x, 4) needs to be calculated:
The calculation results are [[a * x, 1 * x, d * x], [a * 4, 1 * 4, d * 4]], and shape = 2 * 3.
Section 2: We need to calculate the outer product of (b, 2, e) and (y, 5);
The third section is the same, so the results are not listed.
By calculating and playing three paragraphs, a 3*2*3 shape is obtained.
This calculation process can also be expressed intuitively with a picture:
The calculation here is the same as above, or the inner product of each D-dimensional vector can be calculated and then spliced into a column. For each slice, we sum the slices to get a value, and the weight matrix w also becomes the shape of convolution kernel, just like the shape of the slice. We always have d slices, so we have d values. In this way, we convert a three-dimensional vector into a one-dimensional vector:
In the above figure, we can clearly know that if there is a convolution kernel w, we can get a vector, that is, we can get the next x:, and the dimension of this x is * d.
The macro framework of CIN can be summarized as follows:
We always need to calculate k CIN processes and get k one-dimensional vectors. It can be seen that its characteristic is that the order of finally learned feature interaction is determined by the number of layers in the network, and each hidden layer is connected to the output layer through a pooling operation, thus ensuring that the output unit can see different order feature interaction patterns. At the same time, it is not difficult to see that the structure of CIN is very similar to RNN, that is, the state of each layer is calculated by the value of the previous hidden layer and an additional input data. The difference is that the parameters of different layers of CIN are different, but they are the same in RNN. The extra input data of RNN is different every time, while the extra input data of CIN is fixed and always X0.
The way CIN calculates cross features is vector mode,
X 1, h is calculated as follows:
X2, h is calculated as follows:
Calculation method of xk and h:
So this is vector calculation.
This structure is actually the same as DeepFM, with the calculation process of cross features on the left and the DNN part on the right. The calculation method of CTR is as follows:
The loss function is:
The xDeepFM model we introduced today is a network type with DeepFM and DCN that we studied before, and it is composed of cross features +DNN. In xDeepFM, the derivation of CIN is the key. As long as you master the calculation process of CIN, you will know the core of this paper: the feature intersection of vector pattern.