At first Hadoop was only part of Nutch, a sub-project of Apache Lucene.
Lucene is the world's first open source full-text search engine toolkit, which must have been contacted by students who have done Javaweb search function.
It has a complete query engine and some text analysis engines.
Nutch, based on Lucene, has the functions of web crawling and parsing, and can realize the development of a search engine. However, if it is put into use, it must respond in a very short time, and hundreds of millions of web pages can be analyzed and processed in a short time, which requires consideration of distributed task processing, fault recovery and load balancing.
Later, Doug Cutting borrowed from Google's two papers, Google File System and MapReduce: Simplifying data processing on large clusters, transplanted the technology and named it Hadoop.