HDFS is used for applications with large data sets, not why Many small files?
Answer / Jaiprakash
HDFS is designed for storing large data sets. One of the challenges with many small files is that they increase the number of blocks in HDFS, which can lead to increased metadata overhead and slower operations. However, there are techniques like combining multiple small files into larger ones (e.g., using the DistCP utility) to mitigate these issues.
| Is This Answer Correct ? | 0 Yes | 0 No |
What do you mean by metadata in HDFS?
Can you explain heartbeat in hdfs?
How will you perform the inter cluster data copying work in hdfs?
How much Metadata will be created on NameNode in Hadoop?
Will various customers write into an hdfs record simultaneously?
Mention what is the best way to copy files between hdfs clusters?
How to copy a file into HDFS with a different block size to that of existing block size configuration?
Can multiple clients write into an HDFS file concurrently?
What is HDFS?
What is the difference between an input split and hdfs block?
Why is HDFS only suitable for large data sets and not the correct tool to use for many small files?
How hdfs is different from traditional file systems?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)