Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this?
499Is it possible to provide multiple input to Hadoop? If yes then how can you give multiple directories as input to the Hadoop job?
469
What happens when you issue a delete command in hbase?
What Is Difference Between Mapreduce and Pig ?
what is Speculative Execution?
What is Disk Balancer in Apache Hadoop?
What exactly kafka does?
Can you explain smb join in hive?
What is Apache Hadoop YARN?
State some advantages of impala?
How hbase handles the write failure?
Explain why do we need hadoop?
explain the concept of RDD (Resilient Distributed Dataset). Also, state how you can create RDDs in Apache Spark.
What is a secondary namenode?
How can I import large objects (BLOB and CLOB objects) in Apache Sqoop?
What are the tools used in big data?
Define the roles of the file system in any framework?