Current location - Education and Training Encyclopedia - Education and training - Beida Jade Bird java Training: What is the relationship between big data and data mining?
Beida Jade Bird java Training: What is the relationship between big data and data mining?
Based on database theory, machine learning, artificial intelligence and modern statistics, data mining has been applied in many fields.

It involves many algorithms, such as neural network and decision tree derived from machine learning, support vector machine based on statistical learning theory, classification regression tree, correlation analysis and so on.

The definition of data mining is to discover meaningful patterns or knowledge from massive data.

Big data has three important characteristics: large amount of data, complex structure and fast data update.

Due to the development of web technology, the data generated by Web users are automatically saved, sensors are constantly collecting data, the speed of automatic data collection and storage is accelerating with the development of mobile Internet, and the amount of data in the world is constantly expanding. The storage and calculation of data are beyond the capacity of a single computer (minicomputer or mainframe), which challenges the realization of data mining technology (generally speaking, the realization of data mining is based on a minicomputer or mainframe, and parallel calculation can also be carried out).

Google proposed a distributed storage file system and developed the concepts of cloud storage and cloud computing.

Big data needs to be mapped into small units for calculation, and then all the results are integrated, which is the so-called map-reduce algorithm framework.

Some data mining techniques are still needed for computing on a single computer. The difference is that some original data mining techniques may not be easily embedded in the map-reduce framework, and some algorithms need to be adjusted.

In addition, the improvement of big data processing capacity also poses new challenges to statistics.

Statistical theory is often based on samples, but in the era of big data, it is possible to get the whole, not no sampling at all.