The most common embedding method is word2vec, which calculates the embedding amount of each word according to the * * * relationship of words in the corpus. There are two commonly used word2vec models: cbow and skip-gram. Cbow predicts the head word according to the context, and skip-gram predicts the context according to the head word (see the mathematical principle in word2vec for details). So, since the words in natural language can be embedded through the * * * existential relationship, can we also embed the nodes in the graph through the * * * existential relationship by comparing the graph with the whole corpus and comparing the nodes in the graph with the words? For word2vec, every sentence in the corpus can describe the * * * relationship between words. How to describe this * * * relationship for graph? Next, we will extend various graph embedding methods.
The central idea of graph embedding is to find a mapping function to transform each node in the network into a low-dimensional potential representation. Easy to calculate and store, no need to manually extract features (adaptive). The following figure shows several common classifications of graphic embedding:
DeepWalk makes up the gap between network embedding and word embedding by treating nodes as words and generating short random walks into sentences. Then, neurolinguistics model (such as Skip-gram) can be applied to these random walks to obtain network embedding. Its advantages are: first, it can generate random walks on demand. Because the Skip-gram model is also optimized for each sample, the combination of random walking and Skip-gram makes DeepWalk an online algorithm. Secondly, DeepWalk is extensible, and the process of generating random walk and optimizing Skip-gram model is efficient and common parallelization. Most importantly, DeepWalk introduces the paradigm of deep learning graphics.
SkipGram method is used to learn the representation of nodes in the network. Then, according to SkipGram's thinking, the most important thing is to define the context, that is, the neighborhood. ? In NLP, the neighborhood is the words around the current word. In this paper, random walk is used to obtain the neighborhood of nodes in a graph or network.
(1) Random Walk Randomly and evenly selects network nodes to generate a random walk sequence with a fixed length. Compare this sequence to a sentence in natural language (node sequence = sentence, nodes in the sequence = words in the sentence), and use skip-gram model to learn the distributed representation of nodes.
(2) Premise: If the nodes of a network obey the power law distribution, then the number of nodes in the random walk sequence also obeys the power law distribution, and it is found that the word frequency in NLP also obeys the power law distribution.
Forecast Context Node)-Output: Demo
Node2vec defines a strategy generation sequence biased towards random walk on the basis of DW, and still uses skip gram for training.
In this paper, BFS and DFS are analyzed, and the reserved network structure information is different. In DeepWalk, random walking is carried out according to the weight of edges, while node2vec adds a weight adjustment parameter α: T is the last node, V is the latest node, and X is the candidate next node. D(t, x) is the minimum number of hops from t to candidate nodes. By setting different p and q parameters, different information can be saved. When both p and q are 1.0, it is equivalent to DeepWalk.
Use SkipGram method to extract the representation of the network. Then, naturally, according to SkipGram's thinking, the most important thing is to define this context, or neighborhood. ? From the text, this neighborhood is of course the word around the current word, and this definition is natural. But it is not so easy for graphics or network.
(1) first distinguish two concepts:
First-order similarity: directly connected nodes, such as 6 and 7.
The joint probability between nodes vi and vj is defined as
V represents the node and u represents the embedding of the node. The above formula means that the more similar two nodes are, the greater the inner product is, and the greater the value after sigmoid mapping, that is, the greater the weight of the connection between these two nodes, that is, the greater the probability of occurrence between these two nodes? .
Second-order similarity: between nodes connected by other intermediate nodes, such as 5 and 6.
Conditional probability of use
(2) The goal is to keep the similarity between the nodes before and after NRL unchanged, that is, if the first two nodes are similar, then the embedded nodes should also have similar representation vectors. In this paper, -KL divergence is used to measure the distance between two probability distributions.
Take ensuring its first-order similarity as an example:
Before embedding; The empirical joint probability between nodes vi and vj is
Therefore, minimize:
Many GE models have been mentioned before, from classical methods to models that only consider structure, to models that consider extra information of nodes and variables, to depth models. You may be dazzled and don't know which model to use actually.
Let me mention here that the model you choose must be related to your practical problems:
1. For example, if your question pays more attention to content similarity (local neighborhood similarity), then you can choose node2vec, LINE, GraRep, etc.
2. If your question pays more attention to structural similarity, you can choose struc2vec. Here, we can briefly talk about why struc2vec has a qualitative improvement over node2vec in the ant financial service risk control model. This is because in the field of wind control, your credibility does not mean that your neighbors are credible (some "big V" nodes have many neighbors). An intuitive feeling is that if two people are in similar positions in the map (for example, two "big V"),
3. Furthermore, if your model needs to consider the additional information of nodes and edges, then you can choose Kane, CENE, cross-network, etc.
4. If you want to deal with large-scale variable graphs, you can use Graphage, or use other ge methods first, and then use Graphage for inductive learning;
If you want to fine-tune the model, you can choose GraphGAN;;
You can even choose many GE methods to aggregate embedding vectors, such as concat.
As one of the classical methods of knowledge mapping, graph embedding is widely used. Nowadays, Internet search engine companies at home and abroad have realized the strategic significance of knowledge maps, and have built knowledge maps one after another, such as Google Knowledge Map, Baidu Intimate, sogou Knowledge Rubik's Cube and so on, in order to improve the search quality. Knowledge map has an increasingly important influence on the form of search engines.
Learn more about knowledge maps and maps embedded in dry goods, and lock in the open class on Thursday afternoon 14:00 on August 20th!