What is the benifit of Distributed cache, why can we just have the file in HDFS and have the application read it?
Answer / Vikalp Chauhan
The Distributed Cache feature in Hadoop allows applications to put additional files into the distributed cache on each node where the job runs. This feature can be useful when you need files that are not part of your input or output data, but are needed by the application for processing. Instead of reading these files from HDFS every time, they are cached on the nodes, reducing network traffic and improving performance.
| Is This Answer Correct ? | 0 Yes | 0 No |
How does a client read/write data in HDFS?
What are the difference between of the “HDFS Block” and “Input Split”?
How to keep files in HDFS?
Explain how HDFS communicates with Linux native file system?
Why HDFS stores data using commodity hardware despite the higher chance of failures?
What do you mean by metadata in HDFS?
Define hadoop archives? What is the command for archiving a group of files in hdfs.
Why HDFS stores data using commodity hardware despite the higher chance of failures in hadoop?
What is active and passive NameNode in HDFS?
Why is block size set to 128 MB in HDFS?
How does data transfer happen from hdfs to hive?
How to Delete directory from HDFS?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)