Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
How to create RDD?
Explain the flatMap operation on Apache Spark RDD?
What is presto?
Explain how cassandra writes changed data into commitlog?
What is Hadoop Map Reduce ?
What is inputsplit in hadoop? Explain.
Which scala library is used for functional programming?
How can multi-hop agent be set up in Flume?
List the relational operators in pig.
Explain the operations of Apache Spark RDD?
Explain various level of persistence in Apache Spark?
When to use explode in Hive?
Explain what is Hive?
Explain edge nodes in hadoop?
Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this?