Current location - Education and Training Encyclopedia - Resume - The difference between gluster and hadoop
The difference between gluster and hadoop
Gloss:

Object-based storage model

Support interoperability between different versions.

Users can operate different parts of the same file at the same time.

Hadoop:

Streaming data access

Support large files and tens of millions of data sets.

A File Access Model of "Write Once and Read Many"

safe mode

Pipeline replication

Browser and JAVA interface

Files deleted before a certain period of time can be recovered.

Grust:

Line extension

Replace MDS with a dynamic algorithm running at each node.

Support multiple storage and file protocols

Based on fuse

Hierarchical comparison between Hadoop and Lustre

Hadoop is divided into four levels: mapper input, mapper output, reducer input and reducer output.

1. Mapping input: read/write.

Location information of (1) file block is available.

Hadoop: every read and write task is performed in the form of a stream, and there is almost no remote network I/O. ..

Lustre: Perform each read and write task in parallel through each client.

(2) There is no location information of the file block.

Hadoop: every read and write task is performed in the form of a stream, and there is almost no remote network I/O. ..

Lustre: Each read and write task is executed by each client in parallel, which is less than Hadoop's remote network I/O. ..

Adding the location information of file blocks can localize the reading and writing operations as much as possible, thus minimizing the network traffic and improving the reading and writing speed.

2. Mapping output: read/write

HDFS: Written on the local Linux file system, not HDFS itself.

Lustre: Write it on Lustre.

3.Recude input completion stage (shuffling stage) reading and writing.

HDFS: Use HTTP to get map output from remote map nodes.

Lustre: Hard connection to map output will be restored.

4. Reduce output: write

HDFS: the reduce task will write the results to HDFS, and each Reducer has a serial number.

Lustre: the reduce task writes the results to Lustre, and each Reducer is parallel.