Current location - Education and Training Encyclopedia - Graduation thesis - Paper reading (42) The more you know: using knowledge map for image classification
Paper reading (42) The more you know: using knowledge map for image classification
One feature that distinguishes human beings from computer vision algorithms based on model learning is that they can acquire knowledge about the world and use this knowledge to reason about the visual world. Humans can understand the characteristics of objects and their relationships, so as to learn various visual concepts, usually only a few examples are needed. In this paper, the application of structured prior knowledge in knowledge map is studied, which shows that the performance of image classification can be improved by using this knowledge. Based on the recent work on graph end-to-end learning, we introduce graph search neural network as an effective method to merge large knowledge graphs into the visual classification pipeline. In many experiments, we show that our method is superior to the standard neural network baseline of multi-label classification.

(a) GSNN (), as a method, integrates the potential knowledge map into the end-to-end learning system, which is computationally feasible for the big picture;

(b) A framework for image classification using noise knowledge map;

(c) The ability to explain our image classification. Use the propagation model. Our method is obviously superior to the baseline of multi-label classification.

The biggest problem of applying GGNN to image task is the scalability of calculation. For example, Neil [4] has more than 2,000 concepts, and Nell [3] has more than 2 million confident beliefs. Even if we cut our tasks, these pictures are still huge. The forward propagation on standard GGNN is (), that is, the number of nodes, and the backward propagation is (), where is the number of propagation steps. We conducted a simple experiment on GGNNs on the composite graph, and found that after more than 500 nodes, a forward and backward transmission needs to exceed 1 s in an example, even if a lot of parameter assumptions are made. On 2000 nodes, a single image takes more than a minute. It's impossible to use it out of the box, GGNN

Our solution to this problem is graph search neural network (GSNN). As the name implies, our idea is not to circularly update all the nodes of the graph at once, but to start with some initial nodes based on input and select only those nodes whose expansion is useful for the final output. Therefore, we only calculate the update steps on a subset of the graph. So, how do we choose which subset of nodes to use to initialize the graph? In the process of training and testing, we determine the initial nodes in the graph according to the possibility of the concept determined by the target detector or classifier. In our experiment, we used faster R-CNN for each of the 80 kinds of cocoa. For scores exceeding the selected threshold, we select the corresponding nodes in the graph as the initial set of activated nodes.

Once we have the initial node, we also add the nodes adjacent to the initial node to the active set. Considering the initial node, we must first spread the belief about the initial node to all the neighboring nodes. However, after the first time step, we need a method to decide which node to extend next. Therefore, we have learned a scoring function for each node, which estimates the importance of the node. After each propagation step, we predict the importance score of each node in the current graph.

This is a learning network, an important network.

Once we have the value of, we will add the node with the highest score that has never been expanded to our expansion set, and all nodes adjacent to these nodes will be added to the active set. Figure 2 shows this extension. When t= 1, only the detected nodes are extended. When t=2, we expand the selected node according to the importance value and add its neighbors to the graph. In the last time step, we calculate the output of each node and output it to the final classification network by reordering and reordering.

In order to train the importance network, we assign the target importance value to each node of a given image in the graph. Nodes corresponding to the concept of ground truth in the image are given an importance value of 1. Neighbors of these nodes are assigned a value. Nodes beyond two hops have values, and so on. The idea is that the node closest to the final output is the most important extension.

Now we have an end-to-end network, which takes a set of initial nodes and comments as inputs and outputs the output of each node for each active node in the graph. It consists of three groups of networks: communication network, importance network and output network (promotion network and importance network).

Net and output net). The final loss of image problems can be propagated back from the final output of the pipeline through the output network, while the importance loss is propagated back through each importance output. See Figure 3 for GSNN architecture. Firstly, the detection confidence is initialized, and the hidden state of nodes is preliminarily detected. Then we initialize

The hidden state of adjacent nodes, using 0. Then we use the propagation network to update the hidden state. Then use the value of to predict the importance score, which is used to select the next node to add. Then initialize these nodes with and update the hidden state again through the propagation network. After step t, we use all accumulated hidden states to predict the GSNN outputs of all active nodes. In the process of back propagation, the loss of binary cross entropy (BCE) is fed back through the output layer, and the loss of importance is fed back through the importance network to update the network parameters.

The last detail is to add node bias in GSNN. In GGNN, the output function of each node accepts the hidden state and initial comments of the node and calculates its output. In a sense, its meaning with the node is unknowable. That is to say, in training or testing, GSNN adopts a graph that may have never been seen before, and some initial comments on each node. Then, it propagates these annotations through the network using the structure of the graph, and then calculates the output. Nodes in a graph can represent anything from interpersonal relationships to computer programs. However, in our graph network, the fact that a particular node represents a "horse" or a "cat" may be relevant, and we can also confine ourselves to a static graph instead of an image concept. So we introduce the node deviation term, and each node in the graph has some learning values. Our output equation is a deviation term associated with a specific node in the whole graph. This value is stored in a table, and its value is updated by back propagation.

3.3. Image pipeline and baseline.

Another problem to adapt the graphics network to the visual problem is how to merge the graphics network into the image pipeline. For classification, this is quite simple. We get the output of the graph network, reorder it so that the nodes always appear in the final network in the same order, and fill any unexpanded nodes with zeros. Therefore, if we have a graph with 3 16 nodes and each node predicts a 5-dimensional hidden variable, we will create a 1580-dimensional feature vector from the graph. We also connect this feature vector with the FC7 layer (4096 dim) of the fine-tuned VGG- 16 network [35], and connect the highest score (80 dim) of each COCO category predicted by R-CNN faster. This 5756-dimensional feature vector is input into a final classification network, which has been trained to drop out of school.

For baseline, we compare: (1) Only VGG baseline is input to the final classification network; (2) Detect the baseline and input FC7 and the highest COCO score into the final classification network.

[1] Paper Notes: GSNN: The more you know: image classification using knowledge map.

[2] The more you know: Using knowledge map to classify images-Notes on knowledge map image classification papers

[1] camarino/GSNN_TMYN

[2] SteinsGate9 / gsnn_demo