hadoop - how hdfs removes over-replicated blocks -

- July 15, 2010

for example wrote file hdfs using replication factor 2. node writing has blocks of file. others copies of blocks of file scattered around remaining nodes in cluster. that's default hdfs policy. happens if lower replication factor of file 1? how hdfs decides blocks nodes delete? hope tries delete blocks nodes have count of blocks of file?

why i'm asking - if does, make sense - alleviate processing of file. because if there 1 copy of blocks , blocks located on same node, harder process file using map-reduce because of data transferring other nodes in cluster.

when block becomes over-replicated, name node chooses replica remove. name node prefer not reduce number of racks host replicas, , secondly prefer remove replica data node least amount of available disk space. may rebalancing load on cluster.

source: the architecture of open source applications

Search This Blog

Alconcel

hadoop - how hdfs removes over-replicated blocks -

Comments

Post a Comment

Popular posts from this blog

c# - Where does the .ToList() go in LINQ query result -

Listeners to visualise results of load test in JMeter -

android - CollapsingToolbarLayout: position the ExpandedText programmatically -