Current location - Education and Training Encyclopedia - Graduation thesis - Paper notes
Paper notes
This paper mainly repeats the main content of the paper "Event Extraction by Dynamic Multi-pool Convolutional Neural Network" for self-review, hoping to bring help to Otani ~

Traditional ACE event extraction methods mainly rely on well-designed functions and complex NLP tools. These traditional methods lack universality, need a lot of manpower, and are prone to error propagation and data sparseness. In this paper, a new event extraction method is proposed, which aims at automatically extracting lexical and sentence-level functions without using complex natural language processing tools. We introduce a word representation model for language semantic description, and adopt a framework based on Convolutional Neural Network (CNN) to capture sentence-level clues. But CNN can only capture the most important information in a sentence, and may miss valuable facts when considering multi-event sentences. We propose a dynamic multi-pool convolutional neural network (DMCNN), which uses dynamic multi-pool layer to retain more key information according to event triggers and parameters.

At present, the most advanced event extraction method adopts a set of carefully designed features, which are extracted from text analysis and language knowledge. It is usually divided into two categories: lexical features and contextual features.

-Lexical features

Lexical features include part-of-speech tags, substantive information, morphological features (such as markers, entries, etc. ) to capture the background knowledge of semantics or words. Due to the limitation of these clues to predict the semantic background and the sparsity of data caused by a hot coding, the semantics of words can not be fully captured.

-Background characteristics

Context features, such as grammatical features, can get the relationship between their parameters and trigger words from dependency. We call this information sentence-level clues. However, we can't get the target role through this traditional dependent feature, and in addition, it may lead to error propagation in feature learning.

-Convolutional neural network

This paper describes that the recently improved Convolutional Neural Network (CNN) has been proved to be effective in capturing the syntax and semantics between words in sentences. CNN usually uses the maximum pool layer and applies the maximum operation to the expression of the whole sentence to capture the most useful information. However, in event extraction, a sentence may contain two or more events, which may * * * enjoy the parameters of different roles. For example, S3 (S3: In Baghdad, a photographer was killed by an American tank? There are two red events in the Palestinian hotel. ), that is, deaths and attacks. If we use the traditional maximum aggregation layer and only keep the most important information, then we can get the information describing the death of the cameraman, but we miss the information about the American tank, which is very important for predicting the attack events and valuable for attaching the cameraman to red. As a goal argument. In the experiment, we found that such multi-event sentences accounted for 27.3% of our data set, and this phenomenon could not be notified.

Event extraction system: for each sentence, use specific subtypes and their parameters to predict event triggers.

The implementation of this paper is divided into two stages: 1. Trigger classification. Use DMCNN to classify each word in the sentence to identify the trigger word. 2. If a sentence has a trigger, execute the second stage, apply similar DMCNN to assign parameters to the trigger, and align the roles of the parameters.

1. word embedding

This paper puts forward three input types.

-Context Word Feature (CWF): The vector of each word tag is embedded and transformed by looking up words.

Location feature (PF): indicates the relative distance between the current word and the predicted trigger or candidate parameter. Each distance value of position feature is also represented by embedding vector, and the distance value is randomly initialized and optimized by back propagation.

-Event type feature (EF): Event type coding is used to trigger the prediction in the classification stage, which is an important clue in DMCNN or PF.

Set the dimension d=4 of CWF, d= 1 of PF, EF= 1, and the length d=dw+dp×2+de of the word feature vector spliced by CWF to obtain the input matrix X∈R(n×d), and enter the convolution layer.

Set h word windows, filter w ∈ R(h×d), and generate new feature ci through operation (4), where b∈R is an offset term and f is a nonlinear function, such as hyperbolic tangent. The filter is applied to the sentence x 1:h, x2:h+ 1, ..., x(n-h+ 1):n to generate a feature map ci, where the index I ranges from 1 to n-h+ 1.

Step 3 output

5. trigger Classi? Cation model

In trigger classification, only candidate triggers and their left and right tags are used for lexical feature representation. In sentence-level feature representation, we use the same CWF as in parameter classification, but only use the positions of candidate triggers to embed position features. In addition, instead of dividing the sentence into three parts, the sentence is divided into two parts through candidate triggers. In addition to the above characteristics and model changes, this paper also classifies an argument. These two stages constitute the framework of event extraction.

Criteria for judging the accuracy of predicted events:

1. If the event subtype and offset of the trigger match the type of the reference trigger, the trigger is correct.

2. If the event subtype and offset of a parameter match the parameter mentioned in any reference parameter, the parameter can be correctly identified.

3. If the event subtype, offset and parameter role of a parameter match the parameters mentioned in any referenced parameter, the parameter is correctly classified.