If clustering is based on variables (titles), hierarchical clustering (systematic clustering) should be used at this time, combined with clustering tree diagram to make comprehensive judgment and analysis, and scientific analysis results can be obtained. For example, at present, there are 8 referees who rate 300 players, trying to cluster these 8 referees and find out the scoring preference style category of referees. At this time, hierarchical clustering is needed.
There are several points that need special attention in the system cluster:
1: only quantitative data are systematically clustered;
2. If the unit difference of data is large, the data can be standardized first, and then the standardized data can be clustered systematically.
3. Because they are all quantitative data, Pearson correlation coefficient should be used to measure the distance in system clustering in principle. The larger the correlation coefficient value, the closer it means, and the smaller the correlation coefficient value, the farther it means. By default, SPSSAU uses Pearson correlation coefficient to represent distance.
4.SPSS AU uses the group average distance method to cluster the system. Generally speaking, the two items with the strongest correlation are clustered into one category (the first merged cluster), and then the third item with the strongest correlation with the "merged cluster" is found and clustered into the second merged cluster, followed by the third merged cluster, and this process is iterated in turn until the end.
At present, there are 8 referees scoring 300 players, with the lowest score 1 and the highest score10; It is hoped that 8 referees will be clustered to identify the style types of referees. There are ***8 referees * * 8 columns of data * * * 300 rows. Because the scores are all from 1 to 10, and the units of the eight columns of data are the same, there is no need to standardize the data before analysis (of course, standardization is no problem).
In this example, * * * involves 8 titles, and the screenshot of SPSSAU operation is as follows.
By default, SPSSAU will be divided into three categories and display tabular results. If you want more categories, you can set them yourself.
SPSSAU output results
SPSSAU will output the basic descriptions of clustering items first, and then output the clustering category attributes of each item; And output a tree view, as shown below:
The above table shows the basic information of a total of ***8 analysis items (that is, 8 referee data), including mean, maximum or minimum, median, etc. , so as to have a general understanding of the basic data. Overall, the average scores of the eight referees are all above 8 points.
Total * * * clustering is divided into three categories, and the corresponding relationship of specific analysis items. As shown in the above table, it can be seen from the above table that: referee 8 is regarded as a category alone; Referees 5, 3 and 7 points together; It is the same as referee 1, 6, 2, 4.
The correspondence between cluster categories and analysis items can be obtained in the above table, and more information can be obtained by looking at the cluster tree diagram. As for what the cluster category should be called, it needs to be named according to the corresponding relevant situation.
The above diagram is a clustering tree diagram, which shows the specific process of clustering by graphic method. The number in the top row is just a scale unit, representing the relative distance; A node represents a focusing process.
In the interpretation of the tree diagram, it is suggested to draw a vertical line separately, and then divide it into several categories, and the corresponding relationship between each category and the analysis project. For example, the red vertical line will eventually split into three categories; Class 1 corresponds to referee 8; The second category corresponds to referees 5, 3 and 7; The third category corresponds to referee 1, 6, 2, 4.