Big data storage stage: hbase, hive, sqoop.
Big data architecture design stage: Flume distributed, Zookeeper, Kafka.
Real-time computing stage of big data: Mahout, Spark, storm.
Big data data collection stage: Python, Scala.
Big data business practice stage: practical operation of enterprise big data processing business scenarios, demand analysis, solution implementation, and practical application of comprehensive technologies.
Megadata, or huge amount of data, refers to massive, high-growth and diversified information assets, which need new processing modes to have stronger decision-making, insight and process optimization capabilities. In The Age of Big Data, co-authored by Victor Meyer-Schoenberg and Kenneth Cookeye, big data means that all data are used for analysis and processing, and there is no shortcut to random analysis (sampling survey). 5V characteristics of big data: volume (mass), speed (high speed), diversity (diversity), value (value density) and authenticity.
The five "V" or characteristics of big data have five levels:
First, the amount of data is huge.
Jump from TB to PB.
Second, there are many data types.
The aforementioned blogs, videos, pictures, geographic information and so on.
Third, the value density is low.
Take video as an example, in the process of continuous monitoring, the data that may be useful is only one or two seconds.
Fourth, the processing speed is fast.
1 the second law. Finally, this point is essentially different from the traditional data mining technology. The industry classifies it into four "V"-volume, variety, value and speed.
Internet of Things, cloud computing, mobile Internet, car networking, mobile phones, tablet computers, PCs, and various sensors all over the world are all data sources or bearing methods.