Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
Explain the process to trigger automatic clean-up in Spark to manage accumulated metadata.
How to invoke Command Line Interface?
What is a topic in kafka?
What is the best practice on deciding the number of column families for HBase table?
What are the cases where Apache Spark surpasses Hadoop?
What is a rack awareness algorithm?
What is the difference betwaeen mapreduce engine and hdfs cluster?
Explain what happens in text format?
How will you update the rows that are already exported?
What is the role of “ambari-qa” user?
Does impala use caching?
How should you handle session_expired?
Why lazy evaluation is good in spark?
What happens when a datanode fails ?
State some command line options?