Replication causes data redundancy and consume a lot of space, then why is it pursued in hdfs?
Answer / Gyan Chand
In HDFS, replication is pursued to ensure data durability and high availability. By storing multiple copies of each data block across different datanodes, HDFS can withstand datanode failures without losing data. This redundancy helps in maintaining data integrity even during network partitions or hardware failures.
| Is This Answer Correct ? | 0 Yes | 0 No |
How data or file is written into HDFS?
Is namenode also a commodity?
If the source data gets updated every now and then, how will you synchronize the data in hdfs that is imported by sqoop?
What does heartbeat in hdfs means?
How is indexing done in HDFS?
If data is present in HDFS and RF is defined, then how can we change Replication Factor?
What is throughput in HDFS?
What do you mean by the High Availability of a NameNode in Hadoop HDFS?
Why ‘Reading‘ is done in parallel and ‘Writing‘ is not in HDFS?
What is non-dfs used in hdfs web console
How does hdfs get a good throughput?
List the files associated with metadata in hdfs?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)