Hadoop originated from the Apache Nutch project which started in 2002. Is that Apache? One of the subprojects of Lucene. In 2004, Google published a paper called MapReduce: Simplified Data Processing on the Large Clusters at the conference "Design and Implementation of Operating System".
Enlightened Doug Cutting and others began to try to realize MapReduce computing framework and combine it with NDFS (Nutch Distributed File System) to support the main algorithm of Nutch engine. Because NDFS and MapReduce have a good application in Nutch engine.
They were separated in February 2006 and became a complete and independent software, named Hadoop. By the beginning of 2008, hadoop had become the top project of Apache, including many subprojects, which were applied to many Internet companies including Yahoo.
Development status of Hadoop
At the beginning of design, the goal is high reliability, high scalability, high fault tolerance and high efficiency. It is these inherent advantages in design that make Hadoop favored by many large companies as soon as it appears, and it also attracts extensive attention in the research field.
Hadoop technology has been widely used in the Internet field. Baidu uses Hadoop to process 200TB of data every week for search log analysis and web data mining. Domestic universities and research institutes conduct research on data storage, resource management, job scheduling, performance optimization, system high availability and security based on Hadoop, and most related research results are contributed to the Hadoop community in the form of open source.