Current location - Education and Training Encyclopedia - Graduation thesis - How to compare the similarity of two groups of high-dimensional data in data mining
How to compare the similarity of two groups of high-dimensional data in data mining
It seems that this problem is more complicated and cannot be solved by simple classification and clustering.

Can this question be understood as comparing the similarity of two sets of data? Mainly to compare the similarity of decision variable d ("heart disease") =Y/N? That is to say, what is the difference between two different indices for d?

If we can include the values of D ("heart disease") in the two sets of data respectively, and directly compare the accuracy of Yes, isn't it the similarity of the two sets of data?

If you want to write a paper, you can make the problem more complicated and academic. If it is only practical application, you can not stick to beautiful mathematical models and complicated solving processes, as long as you can solve practical problems, right?