Current location - Education and Training Encyclopedia - Graduation thesis - Pre-training Week 33: Efficient Hierarchical Adaptation of Pre-training Language Model
Pre-training Week 33: Efficient Hierarchical Adaptation of Pre-training Language Model
About weekly magazines

In this issue, 9 papers related to pre-training were selected, involving vocabulary transfer, common sense quiz, multimodal training, hierarchical training, comparative learning, image segmentation, graphic model, the role of protein and the exploration of immune feature expression. In addition, in terms of research trends, we selected two pre-training information, which will introduce the latest contents of some large-scale model competitions and annual review of visual algorithms. Finally, in the aspect of resource recommendation, we choose the pre-training resources of 1, which will introduce some latest contents of cross-language abstracts.

Contributors of this issue: Shen Dezhou, Zhai Ke and Wu Xingang.

Paper recommendation

Topic: Russian Yandex, Facebook, etc | Fine-tuning Transformers: Vocal Transmission (Fine-tuning Transformers: Vocabulary Transmission)

Introduction: This paper discusses one of the exploration of transfer learning triggered by the huge pre-training model of fine-tuning downstream tasks: vocabulary transfer. Transformer has become the absolute mainstream in the latest development of natural language processing. Most practical natural language processing applications of these models are usually realized through transfer learning. In this paper, we study whether tagging specific corpus will improve the final performance of the model. Through a series of vocabulary optimization and transfer experiments, the author proves that this vocabulary optimization and transfer strategy can improve the performance of what the author calls the model: this direction of vocabulary transfer has been pioneered in the field of transfer learning.

Address of the paper: "Link"

Title: University of California | Quiz on Cloze Translation and Consistency Optimization.

Introduction: In this paper, knowledge extraction in pre-training language model is studied in the direction of common sense question and answer (CQA). The author focuses on making better use of the knowledge stored in the pre-training language model. Although researchers have found that the knowledge embedded in the pre-training language model can be extracted by filling in the gaps of well-designed relationship extraction and text classification hints, it is not clear whether the author can adopt this paradigm in CQA with more flexible input and output forms. Therefore, the author studies four translation methods that can translate natural problems into cloze sentences, in order to better acquire common knowledge from language models, including syntax-based model, unsupervised neural model and two supervised neural models. In addition, in order to combine different translation methods, the author suggests encouraging the use of unlabeled data to predict the consistency of different translation problems. Experiments on three CQA data sets prove the effectiveness of this method.

Address of the paper: "Link"

Title: University of Wisconsin, Microsoft et al | Region Editing: Region-based Language-Image Pre-training.

Introduction: This paper studies the language image pre-training model based on image region recognition. Image Pre-training (CLIP) is a kind of "image-text pair" contrastive language, which has made remarkable achievements in zero sample image classification and transfer learning. However, the author shows that directly applying this model to identify image regions for object detection will lead to poor performance, because of domain shift: CLIP is trained to match images as a whole with text descriptions, without capturing fine-grained alignment regions and text spans between images. In order to alleviate this problem, the author proposes a new method called RegionCLIP, which significantly extends CLIP to learn the visual representation of regions, thus achieving fine-grained alignment between image regions and text concepts. The author's method uses CLIP model to match image regions with template titles, and then pre-trains the author's model to align these region-text pairs in feature space. When the author's pre-training model is transferred to the task of open vocabulary object detection, the author's method is obviously superior to the existing technologies 3.8 AP50 and 2.2 AP on COCO and LVIS data sets respectively.

Address of the paper: "Link"

Code address: