Current location - Education and Training Encyclopedia - Graduation thesis - Papers on Clustering Analysis Algorithm
Papers on Clustering Analysis Algorithm
Papers on Clustering Analysis Algorithm

Cluster analysis, also known as grouping analysis, is a statistical analysis method to study the classification of (samples or indicators) and an important algorithm of data mining. The following is the clustering algorithm paper that I share with you. Welcome to reading.

I. Introduction

Cluster analysis algorithm is to give n vectors in M-dimensional space R, and assign each vector to one of k clusters to minimize the distance between each vector and its cluster center. Clustering can be understood as: the intra-class correlation is as large as possible and the inter-class correlation is as small as possible. As an unsupervised learning problem, clustering aims to obtain some inherent data rules by dividing the original object set into similar groups or clusters. The basic idea of cluster analysis is: using multivariate statistical values to quantitatively determine the relationship between them, considering the relationship and leading role of multiple factors of the object, and dividing them into different categories according to the difference of their proximity, so that the classification is more objective and practical, and can reflect the inherent and inevitable relationship of things. That is to say, cluster analysis regards the research object as many points in a multidimensional space and reasonably divides it into several categories, so it is a method of grouping and clustering step by step according to the similarity between variable domains, which can objectively reflect the internal combination relationship between these variables or regions. The salt mining area system is a multi-level complex large-scale system, involving many fuzzy and uncertain factors. The economic classification of Pingdingshan's salt mining areas takes all the salt mining areas in Pingdingshan as the research object, takes each salt mining area as the basic unit, takes the economy as the center, and divides the economic types with the development strategy and reasonable layout as the goal. Its basic principles are: the relative consistency of salt mine resources development and utilization in Pingdingshan city; Consistency of natural, economic and social conditions; Maintain the relative stability of some administrative units. The current administrative divisions of Pingdingshan salt mining area can not reflect the similarity of each salt mining area. It is necessary to classify those iron ore areas with similar economic conditions through fuzzy cluster analysis, analyze and find out the differences among the mining areas, prescribe the right medicine, and provide a basis for formulating development countermeasures.

Second, establish an index system.

1. Various index factors should be considered when determining the classification index for dividing economic zones. It is necessary to give priority to the reserves of rock salt resources, and to properly consider the quality, exploration stage and development and utilization of rock salt; There must be both direct indicators and indirect indicators; We should not only consider the present situation of mining area development, but also consider the development process and future direction of mining area. Referring to relevant information and combining with experts' opinions, the economic zoning index of Pingdingshan salt mining area is determined. As shown in table 1. The table lists the specific indicators and the original data of each indicator (the data comes from the summary table of mineral resources reserves in Henan Province in 2006). Table 1 indicator system and indicator data of economic regionalization of salt mining areas Note: N in the table indicates missing data, exploration stage 1, 2 and 3 respectively indicate preliminary exploration, detailed investigation and detailed investigation, and utilization status 1~7 respectively indicates that it is not suitable for further work in the near future, can be used for further work, is difficult to use in the near future, suggests to use it in the near future, plans to use it in the near future, and builds mining areas.

2. Transforming indicator data Because different variables have different dimensions and different orders of magnitude, it is necessary to transform the data in order to make each variable more comparable. At present, there are three methods of data processing, namely standardization, scope standardization and standardization. In order to compare the values of the same index among cities more intuitively, we adopt the method of normalization conversion. The calculation formula is: for the convenience of description, the following settings are made: let Xi(i= 1, 2,3, …, 2 1) be the value of the ith evaluation index in the specific index layer, and Pi(i= 1, 2,3, …, 2/kloc-0) (1) For the higher index ①Xi≥Xmax, then pi =1; ②Xi≤Xmin, then pi = 0;; ③Xmin & lt; Xi<Xmax, its calculation formula is: Pi=Xi-Xmin/Xs, i(2) for the index ①2Xi≤Xmin, pi =1; ②Xi≥Xmax, then pi = 0;; ③Xmin & lt; Xi<Xmax, its calculation formula is: Pi=Xmax-Xi/Xs. See Table 2 for all index data involved in cluster analysis.

Thirdly, cluster analysis.

1, clustering step (stage). The order of clustering is represented by 1~3.

2. Cluster combination. Refers to the case of a certain step of merger, such as the merger of Yexian 1 Tianzhuang salt mine section in the first step and Mazhuang salt mine section in Yexian in the second step. The case number of the first item is used to indicate the new class generated after the merger.

3. coefficient. According to the basic principle of cluster analysis, the cases with the highest degree of intimacy, that is, the cases with the closest similarity coefficient to 1, are merged first. So the coefficient of this column corresponds to the clustering step of the first column, and the coefficient values are arranged from small to large.

4. The first appearance of a new class (StageClusterFirstAppears). If one of the two merging items corresponding to each clustering step is a newly generated class (that is, a class merged by two or more cases), the corresponding column shows at which step the new class was first generated. For example, in the third step, the value displayed in the first column of this column is 1, which means that the first of the two items to be merged is the new class generated for the first time in the first step. If the value is o, it means that the corresponding item is still a case (not a new class).

5. The next stage of the new lesson. It means that the new class generated by the corresponding step will be merged with other cases or new classes in the first step. If the value in the first line is 1 1, it means that the new class generated by the first step of clustering will be merged with other cases or new classes in the first step1/.

6. Analysis Diagram Dendrogramusingaverage Linkage (between groups) Hierarchical clustering tree diagram (method: average linkage between groups) clearly shows the whole process of clustering. He adjusted the actual distance to 0~25 in proportion, and connected similar cases or new categories by connecting them step by step until they were not in one category. On the distance scale at the top of the figure, select a classified distance value as needed (rough division or subdivision), and then cross it on the vertical scale. The vertical line will intersect with the horizontal line, so the intersection point is the number of classified categories, and the cases corresponding to the intersecting horizontal line will be clustered into one category. For example, if the scale value is 5, it can be classified into three categories: Tianzhuang salt section and Mazhuang salt section in Ye county are the first category, Louzhuang salt section and Wulibao salt section in Ye county are the first category, and Yaozhai salt mine in Ye county is the first category. If the scale value is 10, it can be divided into two types: Tianzhuang salt section and Mazhuang salt section in Ye county, Louzhuang salt section in Ye county, Wulibao salt section and Yaozhai salt section.

Four. conclusion

It is appropriate to divide the five salt mines in Pingdingshan into several economic zones, not the more the better, nor the less the better. The purpose of dividing economic zones is to guide economic activities according to the different resource characteristics and exploration and development situation of each salt mine economic zone, so that people's economic activities are more in line with local reality, and each economic zone gives full play to its respective advantages, so as to achieve the purpose of investing less people, producing more, and creating good economic and social benefits. If there are too many divisions, the meaning of division will be lost, and if there are too few divisions, it will be difficult to be targeted. According to the above cluster analysis results, three schemes can be obtained. Among them, two schemes are more suitable and can be selected. Scheme 1: (When the scale is 5, it can be divided into three categories). Tianzhuang salt section and Mazhuang salt section in Ye county belong to the same category, Louzhuang salt section and Wulibao salt section in Ye county belong to the same category, and Yaozhai salt mine in Ye county belongs to the same category. From the cluster analysis, we can see the first scheme of Pingdingshan salt mining area classification map. Scheme 2: (When the scale is 10, it can be divided into two types). Tianzhuang salt section and Mazhuang salt section in Yexian county belong to one type, and Louzhuang salt mine, Wulipu salt mine and Yaozhai salt mine in Yexian county belong to one type. From the cluster analysis, we can see the second scheme of Pingdingshan salt mining area classification map. The principle of cluster analysis in the second scheme of Pingdingshan salt mining area classification map is to aggregate mining areas with similar ore quality, resource reserves, exploration stage and utilization status, and the analysis results are intuitive and obvious. According to the actual administrative division of Pingdingshan City and the characteristics of mining enterprises, the division of iron ore areas is adjusted to make theory and practice more closely combined and better guide practice.

1, Tianzhuang salt section in Yexian and Mazhuang salt section in Yexian belong to the same category, with the same deposit scale, similar resource reserves, similar exploration and development stages and similar utilization degree, so they can be classified into one category.

2. Louzhuang Salt Mine in Ye County and Wulipu Salt Mine in Ye County belong to the same category and belong to the same exploration and development stage.

3. Yaozhai Salt Mine in Yexian County belongs to the first class, with large reserves and high salt grade, and its exploration and exploitation planning is different from the other two classes. Generally speaking, the application of cluster analysis is basically successful, and most of the classification is in line with reality. Based on the above discussion, the division of salt mining areas is shown in the following table: Of course, cluster analysis has its advantages and disadvantages: (1) Advantages: The advantages of cluster analysis model are intuitive and concise. (2) Disadvantages: When the sample size is large, it is difficult to get the clustering conclusion. Because the similarity coefficient is an index to reflect the internal relationship of the subjects, in practice, sometimes although there is a close relationship between them from the data reflected by the subjects, there is no internal relationship between things. At this time, it is obviously inappropriate to get the results of cluster analysis according to the distance or similarity coefficient, and the cluster analysis model itself cannot identify this error.

;