hadoop - how hdfs removes over-replicated blocks -


for example wrote file hdfs using replication factor 2. node writing has blocks of file. others copies of blocks of file scattered around remaining nodes in cluster. that's default hdfs policy. happens if lower replication factor of file 1? how hdfs decides blocks nodes delete? hope tries delete blocks nodes have count of blocks of file?

why i'm asking - if does, make sense - alleviate processing of file. because if there 1 copy of blocks , blocks located on same node, harder process file using map-reduce because of data transferring other nodes in cluster.

when block becomes over-replicated, name node chooses replica remove. name node prefer not reduce number of racks host replicas, , secondly prefer remove replica data node least amount of available disk space. may rebalancing load on cluster.

source: the architecture of open source applications


Comments

Popular posts from this blog

How has firefox/gecko HTML+CSS rendering changed in version 38? -

javascript - Complex json ng-repeat -

jquery - Cloning of rows and columns from the old table into the new with colSpan and rowSpan -