Object-based storage model
Support interoperability between different versions.
Users can operate different parts of the same file at the same time.
Hadoop:
Streaming data access
Support large files and tens of millions of data sets.
A File Access Model of "Write Once and Read Many"
safe mode
Pipeline replication
Browser and JAVA interface
Files deleted before a certain period of time can be recovered.
Grust:
Line extension
Replace MDS with a dynamic algorithm running at each node.
Support multiple storage and file protocols
Based on fuse
Hierarchical comparison between Hadoop and Lustre
Hadoop is divided into four levels: mapper input, mapper output, reducer input and reducer output.
1. Mapping input: read/write.
Location information of (1) file block is available.
Hadoop: every read and write task is performed in the form of a stream, and there is almost no remote network I/O. ..
Lustre: Perform each read and write task in parallel through each client.
(2) There is no location information of the file block.
Hadoop: every read and write task is performed in the form of a stream, and there is almost no remote network I/O. ..
Lustre: Each read and write task is executed by each client in parallel, which is less than Hadoop's remote network I/O. ..
Adding the location information of file blocks can localize the reading and writing operations as much as possible, thus minimizing the network traffic and improving the reading and writing speed.
2. Mapping output: read/write
HDFS: Written on the local Linux file system, not HDFS itself.
Lustre: Write it on Lustre.
3.Recude input completion stage (shuffling stage) reading and writing.
HDFS: Use HTTP to get map output from remote map nodes.
Lustre: Hard connection to map output will be restored.
4. Reduce output: write
HDFS: the reduce task will write the results to HDFS, and each Reducer has a serial number.
Lustre: the reduce task writes the results to Lustre, and each Reducer is parallel.