HadoopExam Learning Resources

HadoopExam Training, Interview Questions, Certifications, Projects, POC and Hands On exercise access

    40000+ Learners upgraded/switched career    Testimonials

Hadoop pipeline write and parallel read?

We know that a client in hadoop reads data in parallel but the data is written in a pipeline anatomy where one data node writes the data into the other. I know that parallel read makes the system more fault tolerant and faster read. But what is the benefit of a pipeline write? Why doesn't a HDFS client itself write data into each node?

Suppose you have a file of 128MB and you want to write this file on HDFS.

The client machine first splits the file into block Say block A, Block B then client machine interact with NameNode to asks the location to place these blocks (Block A Block B).NameNode gives a list of datanodes to the clinet to write the data.

Then client choose first datanode from those list and write the first block to the datanode and datanode replicates the block to another datanode, once the second datanode receive the replicated block it gives the block received acknowledgement to primary datanode and the primary datanode update the block information to NameNode


NameNode keeps the information about files and their associated blocks.

Visit Home Page : http://hadoopexam.com for more detail . As you are not blacklisted user.

You are here: Home Question & Answer Hadoop Questions Hadoop pipeline write and parallel read?