Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) How did you debug your Hadoop code ?
What is the difference between kafka and mq?
Why aggregation cannot be done in Mapper?
What is the difference between a node, a cluster, and data centre?
What are the limitations of Hadoop?
What are the different elements of row in cassandra?
What other technologies have you used in hadoop sta ck?
What happen if one of the datanodes has much slower cpu?
Explain why do we need hadoop?
How is the processing of streaming data achieved in Apache Spark? Explain.
Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this?
How Pig programming gets converted into MapReduce jobs?
When to use Hive?
What are different tombstone markers in hbase?
Why is pig used in hadoop?