Current location - Education and Training Encyclopedia - Graduation thesis - Why data mining?
Why data mining?
Question 1: Why do you want to do data mining and collect customer information? Typical application of data mining technology in customer relationship management.

Customer acquisition

The traditional way to get customers is to attract new customers through a large number of media advertisements and leaflets. This method involves too many aspects, the pertinence is not strong, and the investment of enterprises is too large. Data mining technology can establish a data mining model (mainly referring to the classification of potential customers' reaction patterns) from useful data collected in past market activities. Therefore, enterprises can understand the characteristics classification of real potential customers, so as to be targeted in future market activities, rather than traditional empirical speculation.

Customer segmentation

Segmentation refers to the act of dividing a huge consumer group into market segments. Consumers belonging to the same market segment are similar to each other, while consumers belonging to different market segments are regarded as different. For example, the simple action of organizing and storing the data in the database according to different ages is subdivision. Segmentation allows users to observe the data in the database from a higher level, and segmentation allows people to treat customers in different segments in different ways. Classification, clustering and other technologies in data mining allow users to subdivide the data in the database according to the attributes that enterprises are interested in, such as category, age, occupation, address and preference. Customer segmentation is the basis for enterprises to determine products and services, and it is also the basis for establishing one-to-one marketing for customers.

Analysis of customer profitability

As far as the customers of an enterprise are concerned, most of the profits of the enterprise come from a few customers, so it is difficult for the enterprise to determine which customers have high profit returns and which customers have low profit returns or even negative profit returns. Data mining technology can help enterprises distinguish customers with different profit returns. Therefore, more resources can be allocated to customers with high profit returns to generate greater profits, and at the same time, the investment of customers with low profit returns or negative profit returns can be reduced. Therefore, before data mining, enterprises should establish a set of optimization target methods to calculate profit returns. It can be a simple calculation, such as the income generated by a customer MINUS all the corresponding expenses, or it can be a more complicated formula. Then use data mining tools to mine the corresponding knowledge from the transaction records.

customer retention

With the increasingly fierce competition in the industry, people generally realize that the cost of acquiring a new customer is far greater than the cost of retaining an old customer. Therefore, how to retain old customers and prevent them from losing has become an important topic of CRM. In practical application, data mining tools are used to build models for customers who have lost, and then these models are used to predict customers who may lose in the future, so that enterprises can study the needs of these customers and take corresponding measures to prevent them from losing, thus achieving the purpose of maintaining customers.

Question 2: Why does data mining classify data? I don't quite understand what you mean by classification. Is it in the data preprocessing stage or the purpose of mining?

If we are in the data preprocessing stage, we may only mine the data in a certain field, so as to draw a more confident conclusion;

If it is the purpose of mining, that is, the output of the model, it is easier to understand.

Question 3: What does data mining do? Data mining is a big aspect. You know java, that's good. You can learn from weka, which is a toolkit written in java. For a specific problem, such as how to obtain test data and how to preprocess the data, these weka have direct interfaces.

As for the modeling you said, you can't make it clear in one sentence. First of all, you should investigate which methods are better in this field, and then choose at least several methods, which should be realized, counted, summarized and selected according to your data set. Of course, your data must be representative, which is internationally recognized. As for how to punish these data, they are generally cited in more famous papers, no problem. Of course, there are many tools used. Can't be limited to one way or one tool. Use different tools in different situations and choose according to actual needs. For example, if you want to do clustering, and you choose a weka as a neuron, you may prefer matlab, and the actual situation determines the tool you choose.

Process: data acquisition-data preprocessing-completing the scheduled task. This is a rough process. This collection can be implemented with weka. For data mining, it is an algorithm of 80% data +20%, and the data is very important. The algorithm is actually just a test data set. This is my opinion, I hope it will help you.

Question 4: Why preprocess the original data before data mining? The data contains a lot of noise data, so it is necessary to remove irrelevant data, such as fields irrelevant to analysis.

Understand the data quality, some data quality is not enough for direct use, for example, it contains too many missing values and needs to be processed.

Data fields cannot be used directly, and new fields need to be derived for further data mining.

The data is scattered and needs to be integrated, such as adding tables (adding rows) or merging tables (adding columns).

Through data preprocessing, we can have a preliminary understanding and understanding of the data.

Data preprocessing I recommend a data mining software: SmartMining Desktop Edition, which is the same as SPSS modeler in panel operation, with good preprocessing ability and computing power.

Question 5: Why should we sample data? As a rapidly developing field, the purpose of data mining is to extract effective patterns or useful rules from data. The tasks of data mining are generally divided into association rules, classification and clustering. These tasks usually involve a large number of data sets, which contain useful knowledge. We say that the data set is very large, and the data set has either a large number of records or a large number of attributes, or both. Having a large number of records will take longer to match the model, while having a large number of attributes will make the model occupy more space. Large data set is the main obstacle of data mining algorithm. In the process of pattern search and model matching, it is often necessary to traverse data sets many times, so it is very difficult to load all data sets into physical memory. When the data set becomes larger and larger, the field of data mining is faced with developing algorithms suitable for large data sets. Therefore, a simple and effective method is to reduce the size of data by sampling (that is, taking a subset of a large data set). In the application of data mining, there are two methods of sampling: one is that some data mining algorithms do not use all the data in the data set during the algorithm execution; The other is that the result of running the algorithm on some data is the same as that on the whole data set. This coincides with the two basic sampling methods used in data mining. One method is to embed sampling into the algorithm of data mining; Another method is that sampling and data mining algorithms run separately. But the use of sampling may bring a problem: in the case of small probability, the results are inaccurate, while in the case of high probability, the results are very similar. The reason is that running on a subset of the whole data set may destroy the intrinsic association between attributes, which is very complicated and difficult to understand in high-dimensional data problems.

Question 6: Why is it convenient to use java or python for data mining? Python's third-party modules are rich, the syntax is very concise and the degree of freedom is very high. Python's numpy, scipy and matplotlib modules can complete all the functions of spss, and you can clean up and reduce the data according to your own needs. If necessary, you can also connect to sql and do machine learning. In many cases, data is collected from the Internet through web crawlers. Python has a urllib module, which can easily do this. Sometimes when the crawler collects data, it has to deal with the verification code of some websites. Python has a PIL module, which is easy to identify. Scipy can also do this work if neural network and genetic algorithm are needed. There are also decision trees using codes such as if-then. Clustering can not be limited to certain types of clustering, but can be adjusted according to the actual situation, such as k-means clustering and DBSCAN clustering. Sometimes, it may be necessary to combine the two clustering methods to cluster and analyze large-scale data, which need to be coded by ourselves. In addition, there are many distance expressions to choose from. For example, Euclid distance, cosine distance, Minkowski distance and city block distance are not complicated, but programming with python is very convenient. Based on the content-based classification method, python has a powerful nltk natural language processing module to segment, collect, classify and count language phrases.

To sum up, it is very, very convenient. As long as you know python well enough, you find that you can realize all your ideas quickly by using this tool.

Question 7: Why is it important to study data analysis and data mining deeply? 1, big data:

Refers to the data that cannot be captured, managed and processed by conventional tools in an affordable time range. It is a massive, high-growth and diversified information asset, which needs a new processing mode to have stronger decision-making power, insight and discovery power and process optimization ability;

In Victor? Meyer Schoenberg and Kenneth? Big data in the Big Data Era written by Cookeye means that all data are used for analysis and processing, and there is no shortcut to random analysis (sampling survey). 5V characteristics of big data (proposed by IBM): volume (mass), speed (high speed), diversity (diversity) and value (authenticity).

2. Data analysis:

It refers to the process of analyzing a large number of collected data with appropriate statistical analysis methods, extracting useful information and forming conclusions, and studying and summarizing the data in detail. This process is also the supporting process of quality management system. In practice, data analysis can help people make judgments in order to take appropriate actions.

The mathematical foundation of data analysis was established in the early 20th century, but it was not until the appearance of computers that practical operation became possible and data analysis was popularized. Data analysis is the product of the combination of mathematics and computer science.

3. Data Mining (English: Data Mining):

Also translated as data mining and data mining. This is a step of knowledge discovery (KDD) in the database. Data mining generally refers to the process of finding hidden information from a large number of data through algorithms. Data mining is usually related to computer science, and the above goals are achieved through statistics, online analytical processing, information retrieval, machine learning, expert system (relying on past empirical rules), pattern recognition and other methods.

Question 8: What's the difference between data analysis and data mining? How to do a good job in data mining The difference between big data, data analysis and data mining is that big data is a massive data mining of the Internet, while data mining is more about data mining for small people within the enterprise. Data analysis is to make targeted analysis and diagnosis. Big data needs to analyze trends and development, and data mining is mainly to find problems and diagnose:

1, big data:

Refers to the data that traditional software tools can't capture, manage and process in an affordable time range. It is a massive, high-growth and diversified information asset, which needs a new processing mode to have stronger decision-making power, insight and discovery ability and process optimization ability;

In Victor? Meyer Schoenberg and Kenneth? Big data in the Big Data Era written by Cookeye means that all data are used for analysis and processing, and there is no shortcut to random analysis (sampling survey). 5V characteristics of big data (proposed by IBM): volume (mass), speed (high speed), diversity (diversity) and value (authenticity).

2. Data analysis:

It refers to the process of analyzing a large number of collected data with appropriate statistical analysis methods, extracting useful information and forming conclusions, and studying and summarizing the data in detail. This process is also the supporting process of quality management system. In practice, data analysis can help people make judgments in order to take appropriate actions.

The mathematical foundation of data analysis was established in the early 20th century, but it was not until the appearance of computers that practical operation became possible and data analysis was popularized. Data analysis is the product of the combination of mathematics and computer science.

3. Data Mining (English: Data Mining):

Also translated as data mining and data mining. This is a step of knowledge discovery (KDD) in the database. Data mining generally refers to the process of finding hidden information from a large number of data through algorithms. Data mining is usually related to computer science, and the above goals are achieved through statistics, online analytical processing, information retrieval, machine learning, expert system (relying on past empirical rules), pattern recognition and other methods.

Question 9: Why do you want to do data mining, big data mining and analysis in the process of crm, so as to play the role of crm and do a good job in customer relationship management.