What are the basic steps of BI development?
It is specially used to support the main users to access the original data, and does not include the finished product report generation tools of professionals. 2.OLAP tools. Provides a multidimensional data management environment, and its typical applications are business problem modeling and business data analysis. OLAP is also called multidimensional analysis. 3. Data mining software. Using neural network and rule induction technology to find the relationship between data, and infer from the data. 4. Data marts and data warehouse products. It includes pre-configured software for data conversion, management and access, and usually includes some business models, such as financial analysis models. 5. The definition of executive information system (EIS) should be academic, and most customers don't understand it. In fact, BI is to collect and analyze relevant information and help you make decisions. Most successful business intelligence systems use data warehouse technology. Then, let's take a look at what a data warehouse is: a data warehouse is a theme-oriented, integrated, time-related and unmodifiable data collection data warehouse in enterprise management and decision-making. Its English name is Data Warehouse, which can be abbreviated as DW. The definition put forward by bill inmon, the father of data warehouse, in the book "Building a Data Warehouse" published by 199 1 is widely accepted. A data warehouse is a subject-oriented, integrated, relatively stable (non-volatile) data set to support decision support. ◆ Topic-oriented: The data organization of the operation database is transaction-oriented, and the business systems are separated, while the data in the data warehouse is organized according to a certain topic domain. ◆ Integration: The data in the data warehouse is obtained through systematic processing, summary and sorting on the basis of extracting and cleaning the original scattered database data. The inconsistency in the source data must be eliminated to ensure that the information in the data warehouse is consistent and is the global information about the whole enterprise. Relatively stable: the data in the data warehouse is mainly used for enterprise decision analysis, and the data operation involved is mainly data query. Once a certain data enters the data warehouse, it will generally be kept for a long time, that is, there are a lot of query operations in the data warehouse, but there are few modification and deletion operations, and usually it only needs to be loaded and refreshed regularly. ◆ Reflect historical changes: The data in the data warehouse usually contains historical information, which systematically records the information of the enterprise from a certain point in the past (such as the time when the data warehouse was applied) to the present stage. Through this information, we can quantitatively analyze and predict the development process and future trend of enterprises. Data warehouse is a process rather than a project. Data warehouse system is an information providing platform. It obtains data from business processing system, organizes data mainly through star model and snowflake model, and provides users with various means to obtain information and knowledge from data. Structurally, a data warehouse system should include at least three key parts: data acquisition, data storage and data access. What is a data warehouse? At present, there is no unified definition of the word data warehouse. W.H.Inmon, a well-known data warehouse expert, gave the following description in his book Building a Data Warehouse: A data warehouse is a subject-oriented, integrated, non-volatile and time-varying data set, which is used to support management decisions. We can understand the concept of data warehouse from two levels. First of all, data warehouse is used to support decision-making and analysis-oriented data processing, which is different from the existing operating database of enterprises. Secondly, data warehouse is an effective integration of multiple heterogeneous data sources. After integration, it is reorganized according to the theme, including historical data, and the data stored in the data warehouse is generally not modified. According to the concept of data warehouse, data warehouse has the following four characteristics: 1, subject-oriented. The data organization of the operation database is transaction-oriented, each business system is separated, and the data in the data warehouse is organized according to a certain subject domain. Topic is an abstract concept, which refers to the key aspects that users care about when making decisions using data warehouse. A topic is usually related to multiple operational information systems. 2. comprehensive. Transaction-oriented operation databases are usually related to some specific applications, and these databases are independent of each other and usually heterogeneous. The data in the data warehouse is obtained through systematic processing, summary and sorting on the basis of extracting and cleaning the original scattered database data. Inconsistencies in the source data must be eliminated to ensure that the information in the data warehouse is consistent and global information about the whole enterprise. 3. Relatively stable. The data in the operation database is usually updated in real time, and the data changes in time as needed. The data in data warehouse is mainly used for enterprise decision analysis, and the data operation involved is mainly data query. Once a certain data enters the data warehouse, it will generally be kept for a long time, that is, there are a lot of query operations in the data warehouse, but there are few modification and deletion operations, and usually it only needs to be loaded and refreshed regularly. 4. Reflect historical changes. The operation database mainly focuses on the current data in a certain period of time, while the data in the data warehouse usually contains historical information, which systematically records the information of the enterprise from a certain point in the past (such as the time when the data warehouse was applied) to the current stages. Through this information, we can quantitatively analyze and predict the development process and future trend of enterprises. The construction of enterprise data warehouse is based on the existing enterprise business system and the accumulation of a large number of business data. Data warehouse is not a static concept. Only by providing information to users who need it in time so that they can make decisions to improve business operations can information play a role and play a role. It is the fundamental task of data warehouse to sort out, summarize and reorganize the information and provide it to the corresponding management decision makers in time. Therefore, from the perspective of industry, the construction of data warehouse is a project and a process. The whole data warehouse system is a four-tier architecture, as shown in the following figure. Data source of data warehouse system architecture: it is the foundation of data warehouse system and the data source of the whole system. It usually includes internal information and external information. Internal information includes various business processing data and various document data stored in RDBMS. External information includes various laws and regulations, market information and competitor information. Data storage and management: It is the core of the whole data warehouse system. The real key of data warehouse is data storage and management. The organization and management mode of data warehouse determines that it is different from the traditional database, and also determines its external data expression form. In order to decide what products and technologies to adopt to establish the core of data warehouse, it is necessary to analyze the technical characteristics of data warehouse. Extract, clean and effectively integrate the data of existing business systems, and organize them according to the theme. According to the coverage of data, data warehouses can be divided into enterprise-level data warehouses and department-level data warehouses (usually called data marts). OLAP server: effectively integrate the data needed for analysis, organize it according to multi-dimensional model, conduct multi-angle and multi-level analysis, and find trends. Its implementation can be divided into ROLAP, MOLAP and HOLAP. ROLAP basic data and aggregate data are stored in RDBMS; MOLAP basic data and aggregate data are stored in multidimensional database; HOLAP basic data is stored in RDBMS, and aggregate data is stored in multidimensional database. Front-end tools: mainly include various report tools, query tools, data analysis tools, data mining tools and various application development tools based on data warehouse or data mart. Among them, data analysis tools are mainly aimed at OLAP servers, reporting tools and data mining tools are mainly aimed at data warehouses. At present, there is no unified definition of the word data warehouse. W.H.Inmon, a famous expert in data warehouse, gave such a description in his book "Building Data Warehouse": Data warehouse is theme-oriented, integrated, relatively stable and reflects historical changes (time). We can understand the concept of data warehouse from two levels. First of all, data warehouse is used to support decision-making and analysis-oriented data processing, which is different from the existing operating database of enterprises. Secondly, data warehouse is an effective integration of multiple heterogeneous data sources. After integration, it is reorganized according to the theme, including historical data, and the data stored in the data warehouse is generally not modified. The composition of multidimensional data warehouse Data warehouse database is the core of the whole data warehouse environment, the place where data is stored and the support for data retrieval is provided. Compared with the manipulated database, its outstanding features are supporting massive data and fast retrieval technology. Data extraction tools take data out of various storage methods, transform and sort it out, and then store it in the data warehouse. Access to different data storage methods is the key of data extraction tools, which should be able to generate COBOL programs, MVS Job Control Language (JCL), UNIX scripts and SQL statements to access different data. Data conversion includes deleting data segments that are meaningless for decision application; Convert into a unified data name and definition; Calculate statistical data and derived data; Assign default data to default values; Unify different data definitions. Metadata Metadata is data that describes the structure and establishment method of data in data warehouse. According to different uses, it can be divided into two categories, technical metadata and commercial metadata. Technical metadata is the data that designers and managers of data warehouses use to develop and manage data warehouses every day. Comprise data source information; Description of data conversion; Definition of objects and data structures in data warehouse; Rules for data cleaning and data updating; Mapping from source data to target data; User access rights, data backup history, data import history, information release history, etc. Business metadata describes the data in the data warehouse from a business perspective. Including business subject description, data, query and report; Metadata provides an information directory for accessing the data warehouse, which comprehensively describes what data is in the data warehouse, how it is obtained and how to access it. It is the center of data warehouse operation and maintenance. Data warehouse server uses it to store and update data, and users can understand and access data through it. Access tools provide users with a way to access the data warehouse. There are data query and report tools; Application development tools; Management information system tools; On-line analysis (OLAP) tools; Data mining tools. A data mart is a part of data separated from a data warehouse, which is used for a specific application purpose or scope, and can also be called department data or subject area. In the process of data warehouse implementation, we can often start from a department's data mart, and then form a complete data warehouse with several data marts. It should be noted that when implementing different data marts, the field definitions with the same meaning must be compatible, so as not to cause great trouble when implementing data warehouses in the future. Data warehouse management: security and authority management: tracking data updates; Data quality inspection; Manage and update metadata; Audit and report the use and status of data warehouse; Delete data; Copying, dividing and distributing data; Backup and recovery; Storage management. Information distribution system: send the data in the data warehouse or other related data to different places or users. Web-based information publishing system is the most effective way to deal with multi-user access. Nine steps to design a data warehouse 1) Select a suitable topic (problem-solving field) 2) Clearly define the fact table 3) Determine and confirm the dimension 4) Select the fact 5) Calculate and store the derivative data segment in the fact table 6) Round the dimension 7). Select the duration of the database 8) Needtotttrackslowlychangindimensions 9) Determine the query priority and query mode. Technically, hardware platform: the hard disk capacity of data warehouse is usually 2-3 times that of operating database. Generally, mainframes are more reliable in performance and stability, and are easy to be combined with systems left over from history; PC server or UNIX server is more flexible and easy to operate, and provides the ability to dynamically generate query requests. Questions to consider when choosing a hardware platform: Do you provide parallel I/O throughput? What is the ability to support multiple CPUs? Data Warehouse DBMS: What is its ability to store a large amount of data, query performance and support for parallel processing? Network structure: The implementation of data warehouse will generate a lot of data communication in that part of the network. Need to improve the network structure? Step 1) Collect and analyze business requirements; 2) Establish data model and physical design of data warehouse; 3) define the data source; 4) Selecting data warehouse technology and platform; 5) extracting, purifying and converting data from the operation database into the data warehouse; 6) Select access and reporting tools; 7) Select database connection software; 8) Select data analysis and data display software; 9) Update data extraction, cleaning, conversion and transplantation of data warehouse. 2) Support flat files, index files and legacyDBMS. 3) Data from different types of data sources can be integrated as input. 4) It has a standardized data access interface 5) It is better to have the ability to read data from a data dictionary 6) The code generated by the tool must be maintainable in the development environment 7) It can only extract data that meets the specified conditions. And the designated part of the source data. 8) Data type conversion and character set conversion can be carried out in the extraction. 9) Derived fields can be calculated and generated in the extraction process. 10) The data warehouse management system can be automatically called to perform data extraction regularly, and the results can also be generated as a flat file1). It is necessary to carefully evaluate the vitality and product support ability of software suppliers. The main data extraction tool supplier: Prism Solutions. Carlton's passport. EDA/SQL of information construction company. What does the data warehouse bring? Every company has its own data. Moreover, many companies store a lot of data in computer systems, recording a lot of information and customer information in the process of purchasing, sales and production. Usually these data are stored in many different places. After using data warehouse, enterprises store all the collected information in a unique place-data warehouse. The data in the warehouse is organized in a certain way, which makes the information easy to access and valuable. At present, some special software tools have been developed to make the process of data warehouse semi-automatic, and help enterprises pour data into data warehouse and use the data already stored in the warehouse. Data warehouse has brought great changes to the organization. The establishment of data warehouse has brought some new workflows to enterprises, and other workflows have also changed. Data warehouse brings some "data-based knowledge" to enterprises, which is mainly used to evaluate market strategies and discover new market opportunities for enterprises. At the same time, it is also used to control inventory, check production methods and define customer groups. Every company has its own data. Data warehouse organizes enterprise data in a specific way, thus generating new business knowledge and bringing a new perspective to enterprise operation. Why build a data warehouse? In the early days of computer development, people have put forward the idea of building a data warehouse. The word "data warehouse" was first put forward by Mr. bill inmon in 1900, and its description is as follows: Data warehouse is a data collection specially designed and established to support enterprise decision-making. The establishment of data warehouse by enterprises to fill the existing data storage forms can no longer meet the needs of information analysis. A core idea in data warehouse theory is that the processing performance of transactional data and decision support data is different. Enterprises collect data in their trading operations. In the process of enterprise operation: With the progress of ordering and sales records, these transaction data are constantly being generated. In order to introduce data, we must optimize the transaction database. When dealing with decision support data, some questions are often asked: What kind of customers will buy what kind of products? How much will the sales volume change after promotion? How much will the sales change after the price changes or the store address changes? In a certain period of time, what kind of products sell particularly well compared with other products? Which customers have increased their purchases? Which customers have cut back on their purchases? Transaction database can answer these questions, but the answers it gives are often not very satisfactory. There is often competition when using limited computer resources. When adding new information, we need the transaction database to be idle. When answering a series of specific questions about information analysis, the effectiveness of the system in dealing with new data will be greatly reduced. Another problem is that transaction data always changes dynamically. Decision support processing needs relatively stable data, so that questions can be answered consistently and continuously. The solution of data warehouse includes: separating decision support data processing from transactional data processing. Data is imported from the transactional database into the decision support database, that is, the "data warehouse", according to a certain period (usually every night or weekend). According to the answers to some questions of enterprises, data warehouse organizes data by "subject", which is the most effective way to organize data. The data mart of data warehouse and data mart decision support database faces a certain department or project group in the enterprise. Some expert consultants describe the construction of data mart as a step in the whole process of building data warehouse. First of all, the data warehouse is created to store all the information of the enterprise, and the data in it has an organized, consistent and unchangeable format. Subsequently, the data mart was established to provide different departments with the information they needed. The data warehouse collects all the detailed information, and the data in the data mart is summarized according to the specific needs of users. Other experts believe that building a data mart does not need to build a data warehouse first. In this model, data is directly transferred from transaction database to data mart. A company may establish multiple data marts, but there is no connection between them. This way of creating a data mart without building a data warehouse will be cheaper and faster because its scale is easier to manage. The defect of the second view is that it can't achieve the main purpose of creating data warehouse-unifying all the data of the enterprise into a consistent format. The data of existing transaction processing systems are often inconsistent and redundant.