Current location - Education and Training Encyclopedia - Graduation thesis - How to write the master's opening report
How to write the master's opening report
The master's opening report can be written from the aspects of briefly describing the source of the subject, the purpose and significance of the research, the research status at home and abroad and the development trend. Please refer to the following example for details.

First, the topic source:

This topic comes from two facts learned by the author in his study and practice, and belongs to a self-made topic.

1. The author made a survey in XXX Company in July, 20 1 1, and learned that all industries are facing problems such as a sharp increase in data volume, which brings slow business processing speed and difficult data maintenance. In order to meet this challenge, many enterprises have implemented the big data development strategy. Today's big data development strategy can be summarized into two categories, one is vertical expansion.

That is to say, using devices with larger storage capacity and stronger processing power is very expensive. In the past, many large companies have been using this method to deal with big data. However, since Google released three technical papers on GFS, MapReduce and BigTable in 2004, cloud computing began to rise, and the Apache Hadoop project was launched in 2006.

Since 2009, with the development of cloud computing and big data, Hadoop, as an excellent data analysis and processing solution, has attracted the attention of many IT companies. Compared with the expensive cost of vertical expansion, people prefer to adopt this horizontal expansion method by integrating cheap computing resources. So many IT companies began to explore Hadoop framework to build their own big data environment.

Secondly, the author further learned from the internship in XXX in April, 20 13 that at present, most big data application environments use unstructured databases, such as Hbase for column storage, MangoDB for document storage and Secondary for graphic database.

These unstructured databases have been widely used in big data application environment because of their strong scalability, high resource utilization, high concurrency and fast response. But this application only solves the front-end business processing. In order to use big data to realize business intelligence, it is necessary to provide a data environment-data warehouse for decision support systems and online analysis applications. Therefore, the tutor instructed the author to draw up this topic and study the data warehouse solution based on Hadoop framework.

Second, the purpose and significance of the study:

Nowadays, data has penetrated into every industry and become an important factor of production. In recent years, due to historical accumulation and accelerated data growth, all industries are facing the problem of big data. In fact, big data is both an opportunity and a challenge. Making full use of big data reasonably and transforming it into massive, high-growth and diversified information assets will enable enterprises to have stronger decision-making, insight, discovery and process optimization.

Therefore, many IT companies regard big data as their important development strategy. For example, Amazon and Facebook have laid out the big data industry and achieved remarkable results. In fact, not only large Internet companies such as Google, Yi Bei or Amazon need to develop big data, but also enterprises of any size have the opportunity to gain advantages from big data, thus establishing the foundation of their future business analysis and gaining significant advantages in the competition with their peers.

Compared with large enterprises, small and medium-sized enterprises have different big data development strategies. Large companies can rely on abundant capital and technical strength to develop their own software platforms from their own environment and business. Small and medium-sized enterprises do not have such technical strength and huge capital investment, and prefer to choose universal and relatively cheap solutions.

This paper aims at analyzing the characteristics of database in big data environment, and combining with the popular Hadoop framework, puts forward a data warehouse solution suitable for big data environment and realizes it. It provides reference for small and medium-sized enterprises to build data warehouse in big data environment. Specifically, it has the following three meanings:

First of all, at present, mainstream databases such as Oracle and SQL Server have a complete set of data warehouse solutions, corresponding to their own database platforms. For other relational databases such as MySQL, although there is no data warehouse solution corresponding to the database platform, there are many integrated data warehouse solutions.

For unstructured database, because its data model is different from relational database, a new solution is needed. The implementation scheme of data warehouse based on Hive/Pentaho proposed in this paper can provide reference for it.

Secondly, by integrating multi-source unstructured databases, a theme-oriented and integrated data warehouse can be generated, and a data environment for online transaction processing and decision support can be provided on the big data platform, so that data resources can be effectively used to assist management decisions.

Thirdly, big data is a broad concept, including technical details at all levels such as big data storage, big data computing and big data analysis. The "data warehouse solution and implementation in big data environment" proposed in this paper enriches the ecological environment of big data application technology and provides support for data analysis and data mining in big data environment.

Third, briefly describe the research status and development trend at home and abroad:

The main body of this paper is data warehouse, which is different from the traditional data warehouse based on relational database. This paper mainly studies the construction and implementation of data warehouse based on unstructured database in big data environment. Therefore, it is necessary to elaborate from two aspects: data warehouse and database in big data environment.

(1) Research Status of Data Warehouse at Home and Abroad;

Since bill inmon put forward the concept of "data warehouse" in 1990, data warehouse technology began to rise, which brought new opportunities to the society and gradually became a major technical hotspot. At present, 30% to 40% companies in the United States have established or are establishing data warehouses. Nowadays, with the improvement of data model theory, the continuous progress of database technology, application development and mining technology, data warehouse technology has been developing continuously and played a huge role in practical application.

The decision support system based on data warehouse, online analytical processing and data mining tools is becoming more and more mature. At the same time, the huge benefits of using data warehouse stimulate the demand for data warehouse technology, and the data warehouse market is developing rapidly.

China's enterprise informatization started late, and the development of data warehouse technology in China is still in the stage of accumulating experience. In recent years, domestic large and medium-sized enterprises have gradually realized the importance of using data warehouse technology, and started to establish their own data warehouse systems, such as China Mobile, China Telecom, China Unicom, Shanghai Stock Exchange and China Petroleum.

But overall, China's data warehouse market needs to be further cultivated, and there is still a big gap between data warehouse technology and foreign countries. For this reason, many scientific and technical personnel in China began to make in-depth research on data warehouse related technologies, and put forward technical solutions suitable for domestic demand by absorbing and drawing lessons from foreign technologies.

(2) Research status of non-clustered databases at home and abroad:

With the in-depth application of database technology in various fields, structured databases gradually show some disadvantages. For example, in biology, geography, climate and other fields, the data structure faced by research is not the traditional relational data structure. If you use a relational database to store and display it, you must forcibly convert it from your own data structure to a relational data structure.

Dealing with unstructured data in this way, it is impossible to manage non-relational data in the whole life cycle, and the relationship between data cannot be fully expressed. In this context, unstructured database came into being. Compared with relational database, the field length of unstructured database is variable, and the record of each field can be composed of repeatable or non-repeatable subfields.

This can handle not only structured data, but also unstructured data such as text, image, sound, film and television, hypermedia, etc. In recent years, with the rise of big data, unstructured databases are widely used to support all kinds of structured data for big data processing.

At present, there are many kinds of unstructured databases, including main memory database, column storage database, document database, graphic database and so on. Among them, the common in-memory databases are SQLite, Redis, Altibase and so on. Column storage databases include Hbase, Bigtable, etc. Document databases include MangoDB, CouchDB, RavenDB, etc. The graphic database includes quadratic.

In recent years, unstructured databases in China have also developed to some extent, the most representative of which is the iBASE database in Beth, Guo Xin. It can be predicted that in the near future, with the application of this big data, unstructured databases will be greatly developed and widely used.