Introduction to the basic knowledge of data anti-attack and defense

Early confrontation training methods were mainly applied to image, voice and text data. Because these data are continuous, it is easy to produce counter samples and noise.

The image is disturbed by noise. Where does the chart data come from?

For example, in online social networks, Shuijun accounts pay attention to normal accounts and publish daily content to reduce their suspicion in social networks, so as to avoid being detected and blocked. Here is an example of the water army avoiding title detection by generating noise, which shows that the interference of map noise is still very common.

The definition of graph data anti-attack is: given a graph (nodes and edges), this modification can reduce the performance of algorithms (such as node classification and link prediction) on graph data without being detected.

The change caused by graph modification is defined as disturbance, that is, the disturbance after an attacker attacks graph data must meet certain constraints.

Interference type

1. Maintain structural disturbance: Adding and subtracting nodes and edges will change the structural properties of some graphs, such as degree distribution and node centrality.

Its essence is the change of link. Therefore, the newly generated countermeasure samples should keep the changes of these structural attributes within a certain range. At present, most articles are this type of attack;

2. Maintain the disturbance of attributes: The second disturbance is realized by modifying the attributes of nodes, so the attacker should ensure that these attributes cannot change obviously. We can maintain the stability of features by measuring the similarity of feature vectors of nodes (edges).

Attack methods and categories

1. Poisoning attack: The newly generated countermeasure samples will be used to train the new algorithm. Figuratively speaking, the attacker poisons the training set of the algorithm, thus affecting the performance of the trained algorithm on the unpolluted test set;

2. Escape attack: The newly generated game sample only exists in the test set, and the algorithm will be trained on the unpolluted training set. The attacker's goal is to let the confrontation sample affect the performance of the original trained algorithm in the test set.

Attack task:

1. Node-related tasks: both the task of node classification and the attack on node embedding belong to node-level attacks, and their purpose is to make the classifier make mistakes and reduce its accuracy or recall rate.

Because the main task of graph data is node classification at present, most graph data anti-attack papers study the node classification task;

2. Link-related tasks: Link prediction is another major task on graphic data, which is used for recommendation systems, knowledge maps and social networks.

For link-level attacks, the main purpose is to make the algorithm predict the wrong link target;

3. Full-map-related tasks: Full-map-related tasks are mainly the classification of full maps, which are common in the classification of biological structures.

Generally, it is to learn the low-dimensional embedding of the overall structure of the graph and then classify it. There is little research on counterattack in this respect.

Another attack angle classification:

1. White-box attack: The attacker has mastered all the information of the other party's system, including what method to use, the output result of the algorithm, the gradient in calculation and so on. This scenario refers to when the attacker completely breaks into the target system;

2. Grey box attack: Only a part of information is needed to launch an attack, which is more harmful than a white box attack, because an attacker can launch an attack without completely breaking the target system. In the research, we can further subdivide the categories of gray box attacks according to specific tasks and scenarios;

3. Black-box attack: The attacker can only query the limited attack results and has no understanding of the mechanism of the target system. This kind of attack is the most difficult and the most harmful to the defender.

According to the goal:

1. Availability attack: The attacker's goal is to reduce the performance of the whole system.

Such as overall accuracy, recall rate, etc.;

2. Integrity attack: The attacker's goal is to reduce the performance of specific hot tasks or objects, and does not require the overall performance.

For example, in the friend recommendation task (link prediction), the attacker can make the algorithm unable to predict the friend relationship between two specific people.

What is a medical paper

What are the methods of "imitation writing" in primary school Chinese composition?

How to combine the reality of life with the objective law

Model essay of argumentative essay in senior two.

On the Opening Sentences of College Entrance Examination English Composition

Argumentative essay level 4 composition topic

What is the core version of Ei?

What should I do if the first draft of correspondence undergraduate thesis of Jilin University fails?

About pollen

Subject argumentative essay