abstract:
This paper introduces the meaning of data mining and its difference from traditional data analysis, and discusses its application in the field of information research. ?
Keywords: data mining;
Information science;
Information retrieval;
Intelligence department?
China Library Classification Number: G350.7 Document Identification Number: A Document Number:1007-6921(2009) 07-0303-02.
1 Problems in the field of information science?
1. 1 resource globalization information quantification?
It can be said that the Internet is the largest information base in the world, with various types of resources, including educational websites, virtual libraries and virtual software libraries. , which provides convenience and possibility for collecting the required information. But at the same time, the disorder of network information makes the utilization rate relatively low. In addition, the generation of massive network data makes it difficult to extract useful information. ?
1.2 data is unstructured?
For a large number of unstructured data such as video, audio and animation, it is difficult to find such data by existing retrieval methods. Only data mining technology can efficiently retrieve, process and analyze massive structured or unstructured data. ?
1.3 personalized intelligence requirements?
The individualization of demand makes the traditional one-to-many information service model more and more unsuitable for the requirements of the times. Different enterprises have different demands for competitive intelligence services, and scientific research institutions need sci-tech novelty retrieval services in different fields. These personalized service requirements can only be realized by establishing a one-to-one service platform through data mining technology. ?
To sum up, with the rapid expansion of information, there are more and more means and ways to obtain information, and people can obtain more and more information, but the proportion of useful information is getting smaller and smaller. Therefore, how to find useful information in the vast ocean of information has attracted more and more attention, and data mining technology came into being under such a background. ?
2 Introduction to data mining technology?
2. What does1data mining mean?
To put it simply, data mining is a process of building data analysis models by using various analysis tools and extracting knowledge that people are interested in from large databases (or data warehouses). The extracted knowledge can generally be expressed as concepts, rules, laws, models and other forms. Data mining, also known as knowledge discovery in database, came into being in the early 1980s, which is the product of the combination of artificial intelligence, machine learning and database technology.
It is a process of extracting hidden, unknown but potentially useful information from a large number of incomplete, noisy, fuzzy and random original data. Data mining technology is application-oriented. It not only faces the simple retrieval and query of a specific database, but also needs to conduct in-depth statistics, analysis and reasoning on these data, explore the relationship between the data, and complete the transformation from business data to decision information. Data mining technology improves people's application of data from low-level terminal query to providing decision support for decision makers.
?
2.2 What is the difference between data mining and traditional data analysis?
Compared with traditional data analysis, data mining is to mine information without clear assumptions. Discovered knowledge is usually unknown and unpredictable, but it is very useful to people.
The traditional data analysis is to analyze the data under the premise that people put forward some assumptions, and the results are often predictable. So traditional data analysis is only superficial data analysis, while data mining is deep data mining. ?
3 Application of data mining in the field of information science?
3. 1 intelligence gathering?
Data mining expands the information collection method of manual search (retrieval, purchase, exchange, etc.). ) to the machine automatically grab. In data mining, search engine technology provides a very effective tool for information collection of online information resources. Web mining can not only collect the required information, but also provide the usage of various information resources and hot topics. Using data mining technology, the collected data can be automatically cleaned and redundant, which not only reduces the workload, but also shortens the time from original information to information products. ?
3.2 intelligence processing?
3.2. 1 extends the object of information processing. Data mining technology makes information processing no longer limited to the processing of structured data and single character information, but extended to the processing of visual information such as audio-visual materials and video information, from single structured information processing to heterogeneous, semi-structured or even unstructured text information processing. ?
3.2.2 Innovation of information processing technology. Data mining provides a more scientific and colorful analysis and processing means. For example, in information classification, decision tree inductive analysis, Bayesian classification, propagation classification, association-based classification and so on. It completely breaks through the previous classification idea based on classification table, makes different information adopt different classification methods, and makes the classification results more targeted and scientific;
In the aspect of information clustering processing, information clustering methods for different types of data (divided clustering, hierarchical clustering, density-based, network-based, model-based clustering, etc.). ) so that the same or similar information can be collected together more reliably. More importantly, complex data mining technology makes intelligent processing more suitable for the processing needs of diverse information (geospatial information, time series information, multimedia data, text and Web information, etc.). In the future, intelligence processing will no longer be restricted by the media. ?
3.3 intelligence service?
3.3. 1 Broaden the scope of information services and increase service items. The traditional topic selection service based on manual retrieval will rise to the service mode of automatically mining from extensive online resources and databases and actively pushing information or knowledge to users through the Internet;
The novelty retrieval service of information is no longer limited to various large databases, but extends to the whole network resources, excavates various enterprise portals, and gives comprehensive analysis and novelty retrieval reports. ?
3.3.2 Sublimated the service concept and greatly improved the initiative and quality of service. Due to the application of data mining technology, the focus of information services will shift to decision support services at all levels, while for scientific and technological services, more scientific and technological personnel will use mining tools for "self-help" services. ?
3.3.3 The content and form of information services have been improved.
Because the purpose of data mining is to discover knowledge from massive information, the intelligence department provides users with not only information, but also a lot of knowledge to solve problems. The form of providing information may also be weaving the data groups mined by data into reports or drawing them into intuitive graphics, which is convenient for users to analyze and make decisions. ?
3.4 intelligence analysis?
Association rule analysis technology in data mining will be a supplement to traditional intelligence analysis. Because, through the correlation analysis of the data, we can find the related events hidden in the data, which are not easy to be found and even violate people's consciousness. For example, people would never think that "spoon" and "magazine" would have a shopping relationship in the commodity relationship of a store, but they did find such a relationship in the association mining of data records of a supermarket in the United States, which is difficult to find with traditional intelligence analysis methods. Another widely used data analysis technology in data mining is online analytical processing, which can analyze and process multidimensional data, observe and analyze from multiple angles, and process various data at the same time. In a word, the data analysis technology in data mining will greatly strengthen the ability of information analysis, make information analysis supported in many ways, and the information analysis technology will be more complete and colorful. ?
3.5 Information retrieval technology?
For structured databases or text-based data, traditional retrieval technologies are mostly Boolean logic retrieval or full-text retrieval technology, lacking retrieval means for other media data. The retrieval technology of complex data in data mining will greatly enrich the technical means of information retrieval, such as image recognition technology, voice technology, similarity-based retrieval technology and related retrieval technology for time series data. To be sure, multimedia retrieval technology in data mining can be used for information retrieval, and information retrieval technology will realize cross-media retrieval and usher in a comprehensive breakthrough. ?
4 the impact of data mining on information science?
Data mining not only promotes the development of information science as a technical means, but also has a wide and profound impact on the concept and research field of information science. ?
4. 1 Perfection of the concept of intelligence?
The application of data mining in information science makes information science pay more attention to practicality and use value. The mission of information science should be based on information, with the dissemination, utilization and function of knowledge as the main body. Perfect the ultimate idea of serving people through the maturity of technology. For example, in competitive intelligence service, competitive intelligence is to meet the needs of enterprises to win advantages in market competition and collect information about competitors' technology, market, customers and sales. , and after analysis and processing into competitive intelligence. ?
4.2 Extension of intellectual field?
Data mining is an important technical means, and its application makes the R&D process and application scenarios of information science broader.
Data mining is also a new interdisciplinary research field. In this field, achievements from machine learning, pattern recognition, database, statistics, artificial intelligence and management information system have been gathered, and diversified investments have made this technology flourish and have begun to take shape. ?
4.3 expansion of intelligence work?
Information science originated from library science and philology, and has now developed into an interdisciplinary subject of natural science, technical science and social science. The perfect combination of data mining technology and information science not only meets academic needs, but also has great commercial application prospects. In other words, the research in the field of information science is mainly oriented to production and management, and the research focus is still on the application of visible economic benefits. ?
5 New challenges brought by data mining technology?
At present, the application of data mining technology in information science has become one of the hot spots in this discipline, but there are still many problems to be solved urgently. Especially in practical applications, such as the complexity of data requires more professional knowledge, and the huge database puts forward higher requirements for the efficiency of algorithms, the enhancement of human-computer interaction function in data mining and the security protection of internal data and personal data. We firmly believe that with the continuous progress of database technology, artificial intelligence technology and related disciplines, the above problems will be gradually solved, and data mining technology will better serve the research and society of information science. ?
[References]?
[1] Shi Bing, Zheng Yanfeng. Data Mining Technology in Information Retrieval [J]. Journal of China Science and Technology Information Society, 1999, (3).
[2] Zhao Danqun. Data Mining: Principles, Methods and Applications [J]. Modern Library and Information Technology, 2000(6).
[3] Pu Qunying. Competitive Intelligence System Model Based on Data Mining [J]. Information Technology, 2005, (1).
Miao Jie, Ni Bo. Research on the Application of Data Mining in Integrated Competitive Intelligence System [J]. Journal of China Science and Technology Information Society, 200 1, (4).