Current location - Education and Training Encyclopedia - Graduation thesis - Research on Visual Search Application and Organization Mode Based on Big Data
Research on Visual Search Application and Organization Mode Based on Big Data
Research on Visual Search Application and Organization Mode Based on Big Data

At present, visual search has become a frontier topic in the field of information science, which is mainly used to analyze and study the development law between real-world entity attributes, behaviors, events and visual big data resources. Aiming at the acquisition, organization, description and utilization of visual big data resources, this paper studies the internal mechanism of value discovery and resource integration between visual resources and their space-time related information.

At present, visual search has become a frontier topic in the field of information science, which is mainly used to analyze and study the development law between real-world entity attributes, behaviors, events and visual big data resources. Aiming at the acquisition, organization, description and utilization of visual big data resources, this paper studies the internal mechanism of value discovery and resource integration between visual resources and their spatio-temporal related information, solves the problems of multi-dimensional association and collaborative integration, and then realizes the effective integration, knowledge discovery and real-time interaction of visual big data resources.

Based on this, this study starts with the origin of visual search research from the perspective of information science, describes its development process, concepts and characteristics, discusses several key issues of its theory and application research, and briefly discusses its latest research progress and application.

1. Development and characteristics of visual search in big data environment

Put forward the question of 1. 1

Visual search is not a new term, it first appeared in the fields of psychology and physiology, and it is used to describe people's behavior of determining the position of a specific target after detecting whether it appears in a specific area through visual channels. For example, find the location of a university on the map, order food in the canteen, look for books on the shelves or find someone in the library. In the real world, people often need to use visual search in a complex physical environment to obtain valuable information to decide the next language and behavior. Therefore, the visual search theory has been widely concerned by psychologists and anthropologists, and a lot of research has focused on the understanding and expression of human visual cognition and physiological feedback mechanism, and summarized a lot of application and theoretical knowledge. It is precisely because of the availability and effectiveness of visual search that many jobs, industries and fields are inseparable from this physiological behavior.

With the continuous development and improvement of related basic theories and key technologies, the traditional visual search application is developing towards informationization, technicalization and networking. How to change the traditional visual search behavior into the visual search mode of "what you see is what you know" is gradually put in front of people. At the same time, the rapid improvement of network environment, information technology, computing performance, storage space, data scale and software and hardware facilities has also established an inseparable relationship between the objective physical world and virtual cyberspace, making it possible to realize visual search technology. People can conveniently and quickly collect visual objects in the objective physical world and obtain relevant information from the Internet.

1.2 development course and trend of visual search

In recent years, with the gradual improvement of big data environment and the rapid development of big data technology, the voice of visual resource integration and visual search research is growing. Nature and Science published special studies on big data in 2008 and 20 1 1 respectively, and proposed that images, videos and user interaction information are important components of big data in the future. In 2009, scholars such as Griod and Chandrasekhar of Stanford University introduced visual search theory into the field of information retrieval, put forward concepts such as visual search and mobile visual search, held the first seminar on mobile visual search, and discussed its architecture, application and service mode. In 20 10, Norvig, the former head of Google's technology research department, pointed out in his monograph "2020Visions" published in Nature that "the organic integration of visual resources such as text, images and videos, user interaction information and sensory information will bring great challenges to search engines, and how to deeply integrate visual search results will become Google's future 10. In the same year, Peking University Gao Wen, Huang Tiejun and Duan introduced it to China and held the second mobile visual search seminar, and discussed its key technologies, architecture, organization and description methods of visual resources, standardization of visual resources and construction of visual knowledge base. In 20 12, this theory and technology was quickly accepted by China Computer Federation, and it was believed that the information retrieval mode combining visual search and augmented reality technology would be the new generation of Internet service paradigm after search engine. Subsequently, Zhang Xingwang, Zhu Qinghua and others tried to introduce it into the field of digital library, and carried out research around related theories and application modes.

Judging from the development track of visual search research, the domestic research on visual search is still in the stage of exploration and trial, and the research track has basically crossed the early theoretical trial process and is entering the middle stage of technology and application exploration. Especially after China's Ministry of Science and Technology launched the National Key Basic Research and Development Plan ("973 Plan") "Theory and Method of Cross-media Computing for Public Security" on 201/kloc-0, and studied key scientific issues such as unified representation and modeling methods of cross-media visual resources, relational reasoning and deep mining, comprehensive search and content synthesis, domestic related research entered a rapid development stage. Since 20 15, the importance and necessity of visual search theory and application research have become more prominent. The Action Plan for Promoting the Development of Big Data released by the State Council in September 20 15 proposes to make full use of big data, improve the ability of obtaining and utilizing domain data resources, and promote the integration of all kinds of data and resources. The Guiding Opinions on Actively Promoting "internet plus Action" issued by the State Council in July, 2065438+2005 proposed that "a massive training resource base including voice, image, video, maps and other data should be built, and the construction of innovative platforms such as artificial intelligence basic resources and public services should be strengthened". The National Natural Science Major Research Program "Management and Decision Research Driven by Big Data" holds that "the generation mechanism and transformation law of big data value are highly dependent on the application field". In the "Key Projects of Cloud Computing and Big Data" of the Notice on Printing and Distributing the Guidance for the Application of National Key R&D Plan and Precision Medical Research in 20 16 issued by the Ministry of Science and Technology, it is clearly listed as one of the key research contents, and it is required to carry out research on visual semantic modeling, spatial and temporal positioning and search of visual objects, and cross-scene data association technology.

1.3 research object and characteristics of visual search visual big data resources

The research of visual search has gradually developed into the main research trend in the field of information retrieval. So far, the definition of visual search has not formed a unified understanding, but from the perspective of information retrieval, everyone's general understanding of it refers to an information retrieval method that takes visual resources in the objective physical world as the retrieval object and obtains relevant information through the Internet. It is a comprehensive and applied frontier field, which takes the visual big data resources and related information as the research object, the acquisition, analysis, organization, understanding and expression methods of the visual big data resources as the main research content, the information technology and methods as the main research means, and the discovery of the knowledge value contained in the visual big data resources and the expansion of its utilization ability as the main research objectives. This paper focuses on the analysis and utilization of massive, heterogeneous, dynamic, disorderly and rapidly evolving visual resources in the current big data environment, focusing on how to make full use of rapidly developing information technology to solve the understanding and expression of visual big data resources, how to effectively realize visual search, and how to use visual search technology to discover new knowledge from massive visual big data resources.

There is no doubt that the future is an era of wisdom (or "internet plus"). The rapid development of theories and applications such as smart earth, smart city and smart library provides "fertile soil" for visual search theory and application research. With the rapid increase of data scale derived from the internet plus era, text, images, audio and video, user interaction information and various sensory information will become the mainstream of the "data ocean", and more than 80% of these data sources come from human visual channels. At present, the most important means to grasp the future development of information retrieval and knowledge service in the era of "internet plus" may be visual search.

Visual big data resources contain complex, disorderly and dynamic spatio-temporal information such as text, images, audio and video, and users' viewing records, which makes it the most abundant information carrier in digital libraries and will become the most important information expression and information dissemination medium in the era of "internet plus". Visual search with visual big data resources as the research object, because the knowledge entities and knowledge values in the former knowledge space have their own characteristics in time, space and attributes, visual search also presents the characteristics of complexity, disorder, dynamic change and spatio-temporal semantic correlation. It is also necessary to study the formal expression, systematic organization, structured description and spatio-temporal correlation analysis methods of visual big data resources. It can be known that visual big data resources mainly have the following characteristics:

Visual big data resources include text, images, videos, user viewing information, user interaction information and other spatio-temporal information, and the contents of visual objects and things and event processes it contains have temporal or spatio-temporal correlation in time, space and semantics.

Visualized big data resources have the characteristics of temporal and spatial semantic association, dynamic change, large data scale and complex structure. These dynamic changes based on visual objects, contents and event processes can be expressed and described by spatio-temporal semantic association, and their acquisition, organization and description processes can be expressed by machine language. Through the semantic association mapping among visual objects, things' contents and event processes, the spatio-temporal semantic association of visual big data resources can be established.

Visualized big data resources have the characteristics of large data scale, complex structure, diverse types, multi-dimensional scale correlation and high depth latitude. According to the spatio-temporal semantic relationship of visualized big data resources, the corresponding scale association mechanism can be established. Aiming at the spatio-temporal correlation of visual big data resources with different scales and different depths and latitudes, the multi-dimensional scale conversion and reset among visual objects, things and event processes are realized, and then the semantic correlation analysis of visual big data resources is realized.

Visual big data resources can provide visual resources, understand the behavior of visual objects, establish a development trend model according to the temporal and spatial semantic relationship of visual objects, and predict the possible behavior of a specific thing at a certain stage through effective organization, understanding and description.

For the acquisition, organization, understanding and description of visual big data resources, real-time interaction and feedback between users and visual big data resources and the construction of visual object knowledge base can be realized. According to the similar behavior characteristics, temporal-spatial correlation and real-time interaction results of visual objects, it helps people to make, produce, operate and consume new visual resources to meet the diverse knowledge service needs of digital library users.

2. Application and organization mode of visual search in big data environment.

Only by organizing, analyzing, processing and integrating visual big data resources and establishing a visual search platform for digital libraries based on specific fields can we provide users with big data knowledge services. Different visual search modes in different disciplines and fields will have different acquisition, organization, processing and integration modes of visual big data resources. For this reason, most of the current applications are to establish a domain-oriented visual big data resource integration platform from the perspective of knowledge service and information retrieval, effectively manage and utilize visual big data resources through visual search, and provide services according to the knowledge service needs of specific disciplines, majors and fields to meet all kinds of big data knowledge service needs.

2. 1 application mode of visual search industry based on deep learning

The traditional visual search research mainly uses manual labeling to label the underlying features of visual resources, and then uses machine learning to solve the semantic gap, heterogeneous gap and semantic correlation between visual resources. The integration and utilization method of visual big data resources based on manual annotation requires the annotator to have rich professional knowledge and industry application experience, which consumes a lot of time and labor costs and has low accuracy. Different from the manual labeling method of visual resource characteristics, deep learning is generally based on multi-layer neural network training of visual resource characteristics, and then learning visual features, so as to obtain a more reasonable and differentiated understanding and description of visual features. A large number of studies have proved that the visual features extracted by depth analysis method have been successful in image classification and recognition, visual scene recognition, intelligent monitoring, speech recognition, knowledge map construction and other application fields. The salient feature extraction and segmentation method of visual resources can extract salient feature regions in visual resources by simulating human visual system and physiological cognitive system. At present, the relatively best visual resource feature extraction method has about 95% salient feature detection accuracy and nearly 92% foreground feature segmentation accuracy on the open visual big data resource data set, and it is still improving in various large-scale global visual resource analysis and recognition competitions in recent years. For example, in the large-scale visual recognition challenge (ILSVRC), the Google research team adopted an improved deep convolution network, Google Net, which improved the accuracy of image recognition to 93%. Google team won the first prize of Microsoft Image Title Generation Challenge (MS COCO ICC) by using image feature extraction method based on in-depth analysis. University of Technology Sydney, Carnegie Mellon University, Microsoft Research Asia and Zhejiang University all combined the depth analysis method with the motion characteristics of visual objects to identify the motion of visual resources, and won the top three respectively.

The theoretical results of traditional academic research often take a long time to mature and enter the actual industrial application. But both deep learning and visual search have strong engineering theoretical models. On the one hand, they are not only studied by academic circles, but also concerned and tried by industrial circles; On the other hand, because of industrial fields (such as Google, Baidu, Microsoft, etc. ) They have long owned large-scale visual big data resources and have been active in the research frontiers of many information science fields. They have advantages over the academic circles in many fields. For example, Google's knowledge map, Google Now and Google Street View map, Microsoft's voice assistant Cortana, Iqiyi's brain, and Facebook's map search are all classic application cases of industrial visual search. In fact, major foreign industrial companies, such as Google, Facebook and Microsoft, have not only done a lot of research on visual search, but even set up specialized research institutions internally, and Baidu, Huawei, Tencent and Alibaba in China are no exception.

2.2 Visual search knowledge service mode based on knowledge computing

An important purpose of studying the theory and application of visual search in the field of digital library is to provide embedded collaborative knowledge services for researchers in universities and scientific research institutions. The visual search platform of digital library embeds massive visual big data resources and the organization, analysis and processing functions of visual big data resources provided by the platform into the process of knowledge service.

The integration and utilization of visual big data resources is a research hotspot in the field of artificial intelligence and information retrieval at home and abroad, which has a very wide application and research prospect. In fact, as a research branch of visual search, in recent years, many individuals (such as senior engineer of Chinese Academy of Sciences, Gao Wen Huang Tiejun of Peking University, Zhu Qinghua of Nanjing University) and institutions (such as Zhejiang University, Tsinghua University, Peking University, Chinese Academy of Sciences Computing, etc.). ) and enterprises (such as iQiyi, Baidu, Tencent, 360, sogou, etc.) are doing related research. Massachusetts Institute of Technology, University of California, Berkeley, University of Illinois and Oxford University in the UK started earlier and developed corresponding image search systems based on image content.

In all the above-mentioned related researches, there is a typical research feature: the purpose of the research is to solve the application problems of visual search, and the corresponding visual search modes are mostly based on knowledge calculation. Because the objects that visual search needs to organize, analyze and process mainly include text, images, videos and other visual resources containing a lot of value, how to obtain valuable knowledge from visual big data resources has become a research hotspot in foreign academic and industrial circles. The knowledge base whose purpose is to explore the rich and complex knowledge contained in visual big data resources is called visual object knowledge base. At present, there are no fewer than 60 kinds of knowledge bases based on visual resources such as text, image, audio and video, and there are hundreds of specific application cases and system platforms based on these visual object knowledge bases. Among them, the typical application cases are Wikipedia's dbpedia (version 20 14 contains 87,000 movies,123,000 records, 450,000 objects, etc. ), Google's knowledge map (including 500 million search result entities such as landmarks, cities, people's names, buildings, movies, artworks and 35 billion related knowledge items), Facebook's map search (including/kloc-0 billion users, 240 billion pictures,/kloc-0 billion page views, etc.). ).

With the help of the relevant theories and technologies of visual search, the research on massive, heterogeneous and diverse visual big data resources can not only enrich the extension and connotation of information retrieval, but also effectively solve the bottleneck of "big data, less knowledge and less service" faced by digital libraries at present, which has certain application value and practical significance.

2.3 Visual content association organization model based on semantic analysis

From the existing research, the research objects of visual search are mostly concentrated on text and images, among which image search is the focus of scholars' efforts. The research of visual search can be divided into three stages: one is image search based on text/metadata, which began in the late 1970s. This method mainly describes the image by manually tagging metadata, and realizes the information retrieval function of the image. Disadvantages are that metadata tagging is time-consuming and laborious, description standards and feedback content are incomplete, and it is easy to have too many subjective colors. Secondly, the image search method based on visual content was put forward in 1990s. The essence of this method is to compare the similarity of images by artificially constructing the underlying visual features of images, and then realize image search. The disadvantage is that the semantic gap between the low-level visual features and the high-level semantics of the image has not been well solved. The third is an image search method based on deep learning proposed in the early 20th century. Social networks and user-generated content have become the main sources of network data. Using user tags to organize, express and understand image semantics has become the mainstream of research, and deep learning methods have been integrated into related fields.

Compared with image search, video expression and analysis is a relatively new research field in visual search. Video is composed of a large number of image frames, and there is a close temporal and semantic correlation between image frames, which requires high visual search technology. However, due to the success of deep learning in the field of text and image search, scholars began to organize, understand and describe videos with the help of deep learning framework, especially in the key link of video feature extraction, and adopted the following methods: First, video static key frame feature description. Because video is composed of a large number of image frames in time series and semantic association, we can learn the characteristics of static video frames (that is, image key frames) by deep learning. In specific applications, once a reasonable static key frame extraction and coding method is determined, a good video description effect can also be formed. The second is the description of dynamic video timing characteristics. Some scholars put forward the dense trajectory method to analyze video, and achieved good results. The third is the organic combination of the first two methods. Simonyan of Oxford University and others proposed to use spatio-temporal depth neural network to analyze video. The original video input on the time axis is used to identify the visual objects in the video, and the time-series related field input on the space axis is used to identify the movement and trajectory of the visual objects in the video.

At present, there are a lot of competitions at home and abroad for the analysis and expression of visual content. For example, in 20 13, the THUMOS competition organized by the University of Florida in the United States analyzed and understood the heterogeneous and disorderly visual resources in massive visual data sets, and related research was carried out every year thereafter. Tsinghua University, Zhejiang University, Chinese University of Hong Kong, Carnegie Mellon University, Sydney University of Science and Technology and many other universities and research institutions at home and abroad actively participated in this competition. 20 1 1 TRECVID competition organized by the national institute of standards and technology, USA, studied the problem of event monitoring in complex visual resources in large-scale visual data sets. In recent years, the contest has been carrying out related research around this theme, and many domestic universities, such as Fudan University, Zhejiang University, Beijing Institute of Technology and Tongji University, have also achieved certain results in this contest.

At present, although there are many research achievements in the organization, analysis, understanding and utilization of visual big data resources, the ultimate goal of these achievements is to apply them to visual search. In recent years, a series of studies have played a positive role in visual search and its application and promotion in various industries and fields, which is a positive signal for the field of digital libraries.

Five core issues of visual search research in three data environments

Although visual search has attracted great attention from the industry and academic circles (including digital libraries), it has not been widely used and popularized in China at present, mainly because the related technologies and application products are not fully mature, and there are some problems such as unsatisfactory or unstable visual search performance, poor user experience and strong application limitations. Around these problems, it is necessary to solve them from the perspective of basic theory and technology of visual search research. From the construction process of digital library visual search mode [1], visual search research mainly includes five core issues, which are described as follows.

Acquisition and organization methods of visual big data resources. The existence forms of visual big data resources in the Internet environment are dynamic disorder and heterogeneous discrete, and the production and release of visual resources are dynamic. The information content contained in visual resources contains many heterogeneous and complex information topics, and there is a semantic space-time relationship between them. However, the traditional visual resource labeling methods based on manual labeling are often not accurate enough. Therefore, how to quickly obtain the required visual resources is a key issue in the application of visual search. The cleaning and filtering of visual resources unrelated to the visual objects to be searched and the effective organization of visual big data resources are the core issues of visual search applications.

Understanding and expression methods of visual big data resources. In order to find the visual resources consistent with the object to be searched in the massive visual big data resources, it is necessary to analyze and understand the characteristics of the visual resources to be searched, and to understand and express their visual contents in a diversified, structured and multi-level way.

Integration and interaction methods of visual big data resources. As a way of information retrieval, visual search serves users. The purpose of acquiring, organizing, understanding and expressing visual big data resources is to provide users with intelligent and humanized knowledge services. Therefore, how to conduct multi-dimensional analysis around the whole life cycle of the integration of visual big data resources to meet the diversified knowledge service needs of users for visual big data resources is also the core issue of whether visual search research can become a reality.

Construction and standardization of visual object knowledge base. Visual search depends on the construction of visual object knowledge base. Based on the high-quality visual object knowledge base, users can quickly and effectively associate the visual object to be searched with the visual big data resources in the virtual information space, thus enjoying the visual search knowledge service provided by the digital library. At the same time, standardization is also the key to the smooth application and promotion of visual search applications.

Security and reliability theory of visual search system. At any time, network security and system reliability are always unavoidable issues, and visual search is no exception. In the visual search system, data security and intellectual property rights, user privacy, system availability and reliability are also the core issues for the effective popularization and application of visual search.

4 Summary and prospect

In the era of "internet plus", information services are more and more widely infiltrated into the users' intelligent, personalized and embedded knowledge service needs, and the digital library field has begun to call for a new killer information retrieval model. Visual search is an important frontier and innovative breakthrough in the field of information retrieval. On the basis of fully absorbing the advanced research results in the field of information science at home and abroad, the basic theory and application research of visual search in digital library will not only enrich the research ideas and future development framework of digital library knowledge service theoretically, but also help to reveal the generation mechanism and transformation law of the value of visual big data resources in digital library.

There is no doubt that mankind is marching towards the "internet plus era". As a technical and conceptual innovation, visual search must conform to the basic laws of the survival, development and maturity of general information technology, and it needs to go through six stages: the embryonic stage of technology birth, the rapid development stage, the peak stage of rapid expansion, the low stage of defoaming, the bright stage of steady development and the peak stage of practical application. At present, the existing visual search research at home and abroad is in the development stage, and there is an imbalance between disciplines after the intersection of theory and technology. At present, the research on the theory, method and technology of visual search mainly focuses on the application of commercial visual search, while less attention is paid to the academic field that produces visual big data resources. In fact, the visual big data resources represented by academic fields such as scientific research and subject services have rich connotations and unique characteristics different from commercial applications. Only by mastering the relevant research in commercial applications and academic fields can we help to establish a more scientific, systematic and reasonable theoretical system and application framework of visual search.