What is the problem in having lots of small files in hdfs?
Answer / Awadhesh Kumar Singh
In HDFS (Hadoop Distributed File System), managing a large number of small files can lead to inefficient storage utilization and increased metadata overhead. This is because each file has associated metadata, such as filename, replication information, access permissions, etc., which takes up space and processing time. Moreover, small files may not be efficiently stored across datanodes due to block-level organization, leading to increased network traffic for data retrieval.
| Is This Answer Correct ? | 0 Yes | 0 No |
What is Fault Tolerance in Hadoop HDFS?
What is the throughput?
How data or a file is written into hdfs?
Which one is the master node in HDFS? Can it be commodity hardware?
What is the difference between Input Split and an HDFS Block?
How to Delete file from HDFS?
Can you explain about the indexing process in hdfs?
What are the key features of HDFS?
Explain how HDFS communicates with Linux native file system?
Explain the hdfs architecture and list the various hdfs daemons in hdfs cluster?
How to keep files in HDFS?
Can multiple clients write into an HDFS file concurrently?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)