hadoop - how hdfs removes over-replicated blocks -
for example wrote file hdfs using replication factor 2. node writing has blocks of file. others copies of blocks of file scattered around remaining nodes in cluster. that's default hdfs policy. happens if lower replication factor of file 1? how hdfs decides blocks nodes delete? hope tries delete blocks nodes have count of blocks of file?
why i'm asking - if does, make sense - alleviate processing of file. because if there 1 copy of blocks , blocks located on same node, harder process file using map-reduce because of data transferring other nodes in cluster.
when block becomes over-replicated
, name node
chooses replica remove. name node
prefer not reduce number of racks host replicas, , secondly prefer remove replica data node
least amount of available disk space. may rebalancing load on cluster.
Comments
Post a Comment