Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What is the characteristic of streaming API that makes it flexible run MapReduce jobs in languages like Perl, Ruby, Awk etc.?
Explain the LOAD keyword in Pig script?
What do you understand by cassandra?
Name the three layers, Ambari supports?
what is Speculative Execution?
Can you use Spark for ETL process?
What is NameNode? How NameNode tackle Datanode failures in Hadoop?
Explain Zookeeper Queues?
What happen if a datanode loses network connection for a few minutes?
What load do concurrent queries produce on the namenode?
As part of optimizing the queries in hive, what should be the order of table size in a join query?
Explain Reliability and Failure Handling in Apache Flume?
Can we change the body of the flume event?
What is the difference between scala and spark?
Can you define a block and block scanner in hdfs?