Hadoop implements the Distributed File System (HDFS). It has the characteristics of high fault tolerance and is designed to be deployed on low-cost hardware. In addition, it also provides high throughput for accessing application data, which is suitable for those applications with large data sets.
trait
1. Snapshots support storing copies of data at a specific time. Snapshots can roll back a failed cluster to a previous normal point in time. HDFS already supports metadata snapshots.
2.HDFS is designed to support large files. Programs running on HDFS are also used to process large data sets. These programs only write data once, and read data once or more. These read operations need to meet the streaming speed.
HDFS supports write once and read many times. In HDFS, the typical block size is 64MB. An HDFS file can be divided into multiple blocks with a size of 64MB, and each block can be distributed on different data nodes if necessary.
3. Stage status: the client's request to create a file will not be immediately forwarded to the name node. In fact, at first, HDFS clients cached file data in local temporary files.