Principle analysis of convolutional network (GCN)

Graph convolution network involves two very important concepts: graph and convolution. The traditional convolution method shows great power in European data space, but it is dumb in non-European data space. One of the most important reasons is that the traditional convolution method can not maintain "translation invariance" in non-Euclidean data space. In order to extend convolution to topological graphs of non-Euclidean data structures such as graphs, GCN was born. Before getting to know the background of GCN, I think it is necessary for us to understand the following concepts:

Semi-supervised classification of paper links based on graph voluntary network

1. Laplacian matrix and its variants

Given the adjacency matrix of the sum of the degree matrices of a simple graph with nodes, the Laplace matrix of the graph can be expressed as. The elements in the diagram are as follows:

1. traditional Fourier transform

When the transformation object is a discrete variable, the integral is equivalent to the inner product, that is

The following is the characteristic function of the legendary Laplace operator (Laplace operator is the second-order differential operator in European space, after makeup removal).

Why do you say that? Because from the generalized definition of the characteristic equation, it is a transformation, a characteristic vector or a characteristic function, and an eigenvalue. We take the second derivative of the basis function, which can be regarded as the characteristic function of the transformation.

In the figure, Laplace matrix can be decomposed by spectrum (characteristic decomposition), and the matrix composed of its characteristic vectors is, according to the definition of characteristic equation, we can get. By comparison, we can find that it is equivalent to, equivalent to. Therefore, the Fourier transform on the graph can be written as.

From the basic idea of Fourier transform, the essence of Fourier transform is to transform it into a set of coordinate representations on orthogonal bases for linear transformation, and coordinates are the results of Fourier transform. The following figure shows the size of the projection component on the first base.

We extend the Fourier transform on the graph to matrix form by matrix multiplication:

Is the eigenvector of the th node on the graph, and the Fourier transform form on the graph can be obtained:

Here is the transposition of the eigenvectors of the Laplacian matrix of a graph. In the excellent properties of Laplace matrix, we know that the matrix composed of eigenvectors of Laplace matrix is orthogonal, that is, satisfied, so the inverse Fourier transform form of the graph is as follows:

So far, we have extended the traditional Fourier transform to the Fourier transform on the graph by analogy. Next, we will use the bridge of Fourier transform to let convolution and graphics have a good drink.

In the preface, we know the famous convolution theorem: the Fourier transform of a function convolution is the product of its Fourier transform, that is, for, the convolution of the two is the inverse of its Fourier transform;

We substitute the Fourier transform formula into the chart obtained in the previous section and get:

It's Hamada product, which means point-by-point multiplication.

Generally, we regard the node features of the input graph as a trainable convolution kernel with parameters * * * to extract the spatial features of the topological graph. In order to further understand the convolution kernel, we rewrite the above formula as:

Perhaps some people have doubts about the transformation of the above formula, and the proof is actually very simple. Those who are interested can look at the answer, the proof of GCN equality-Zhihu.

So far, we have got the prototype of GCN.

1. First generation GCN

The core of convolution operation is a convolution kernel, which can be trained and shared by parameters * * *, so the first generation GCN directly replaces diagonal elements in the above formula with parameters. First, the assignment is initialized, and then the parameters are adjusted by the back propagation error.

So the first generation of GCN became a sauce:

It is the representation vector of each node feature in the graph and the output of each node after GCN convolution. Each node in the graph should be convolved with convolution kernel to extract its corresponding topological space, and then spread to the next layer through activation function.

The shortcomings of the first generation of GCN are also obvious, mainly as follows.

2. Second generation GCN

Facing the shortcomings of the first generation GCN with too many parameters, the second generation GCN was improved. Because the Fourier transform on the graph is a function of eigenvalue, it can also be written that the convolution kernel is improved by k-order polynomial:

Substitute to get:

So the second generation GCN is like this:

It can be seen that the final simplified result of the second generation GCN does not need matrix decomposition, and the Laplace matrix is directly transformed. The parameter k is generally much less than the number of nodes in the graph, so compared with the first generation GCN, the number of parameters of the second generation GCN is obviously less than that of the first generation GCN, which reduces the complexity of the model. For parameters, they are initialized first, and then the parameters are updated according to the back propagation of errors. But it still needs to be calculated, and the time complexity is.

In addition, we know that for the k power of a matrix, we can get the nodes connected with the k hop of the central node, that is, whether the element in the graph is 0 indicates whether one node in the graph can reach another node after the k hop, where k actually represents the size of the receptive field of the convolution kernel, and the characteristic of the central node is updated by aggregating the neighboring nodes in each central node's k hop, and the parameter is the weight of the neighbors in the k hop.

To be continued.

1. In the convolution of the spectrogram, we decompose the Laplace matrix of the graph. The characteristic decomposition of Fourier space is helpful for us to understand the potential subgraph structure. ChebyNet is a typical deep learning architecture using spectral domain convolution.

2. Spatial convolution acts on the neighborhood of nodes, and we get the feature representation of nodes through their K-hop neighbors. Spatial convolution is simpler and more effective than spectral convolution. GraphSAGE and GAT are typical representatives of spatial convolution.

refer to

1./

3./yyl 424525/article/details/ 100058264

Discrete mathematics thesis website

There is a bathhouse culture in northern China. How did the bathhouse culture originate?

Solar car paper

Can the embedded experiment report be published?

Excuse me, what movies are about the development of Shenzhen, such as "Working Girls in the Special Zone"! thank you

2065438 An 800-word argumentative essay in Anhui Province, entitled "Who has the final say in script revision".

Argumentative essay of Grade Two: Fiona Fang _ 1200 words.

What are the ten thousand templates for mechanical undergraduate thesis?

Who's edgar allan poe?

Typical jade articles in the Spring and Autumn Period and the Warring States Period