Current location - Education and Training Encyclopedia - Graduation thesis - Can you explain mapreduce in hadoop?
Can you explain mapreduce in hadoop?
Hello, let me answer for you:

MapReduce is a kind of data processing thought, which was first invented by Jeff Dean of Google and Yahoo! Doug Cutting realized the open source version of MapReduce, and later developed into Hadoop.

Hadoop includes an open source MapReduce computing framework and a distributed file system: HDFS.

The essence of MapReduce is parallel processing, and moving programs is more cost-effective than moving data.

If you just do some simple statistics (such as counting, grouping basis, sorting basis, average value, maximum value, minimum value, etc. ), the hive will be more suitable for you. When you import all 500G into hive, you can directly enter SQL (not strictly SQL, but really similar to SQL) on the command line of Hive to execute the query you want.

Hive and Pig are both data analysis tools based on Hadoop, and both rely on Hadoop, but Hadoop is not necessarily used for data analysis and statistics. For example, Google uses it as an index building.

If my answer doesn't help you, please keep asking questions.