Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What is the importance of — the split-by clause in running parallel import tasks in sqoop?
What are the most memory-intensive operations?
Why is HDFS only suitable for large data sets and not the correct tool to use for many small files?
What is the bottom layer of abstraction in the Spark Streaming API ?
Does spark require hadoop?
Differentiate Reducer and Combiner in Hadoop MapReduce?
How hive can improve performance with orc format tables?
What daemons run on master nodes?
Can you use Spark to access and analyse data stored in Cassandra databases?
Can you mention some features of spark?
Developing a MapReduce Application?
Can you explain data versioning?
What is Apache Cassandra?
Explain about the different cluster managers in Apache Spark
Which one will you choose for a project –Hadoop MapReduce or Apache Spark?