HadoopExam Learning Resources

CCD-410 Certifcation CCA-500 Hadoop Administrator Exam HBase Certifcation CCB-400 Data Science Certifcation Hadoop Training with Hands On Lab Hadoop Package Deal

Hadoop pipeline write and parallel read?

We know that a client in hadoop reads data in parallel but the data is written in a pipeline anatomy where one data node writes the data into the other. I know that parallel read makes the system more fault tolerant and faster read. But what is the benefit of a pipeline write? Why doesn't a HDFS client itself write data into each node?

Suppose you have a file of 128MB and you want to write this file on HDFS.

The client machine first splits the file into block Say block A, Block B then client machine interact with NameNode to asks the location to place these blocks (Block A Block B).NameNode gives a list of datanodes to the clinet to write the data.

Then client choose first datanode from those list and write the first block to the datanode and datanode replicates the block to another datanode, once the second datanode receive the replicated block it gives the block received acknowledgement to primary datanode and the primary datanode update the block information to NameNode


NameNode keeps the information about files and their associated blocks.

You have no rights to post comments

You are here: Home Question & Answer Hadoop Questions Hadoop pipeline write and parallel read?