Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What are the benefits of Spark lazy evaluation?
Explain what happens when hadoop spawned 50 tasks for a job and one of the task failed?
What is throughput? How does hdfs provides good throughput?
Explain Spark map() transformation?
Compare MapReduce and Spark?
Is it possible to use Apache Spark for accessing and analyzing data stored in Cassandra databases?
What is the usefulness of the distributed by clause in hive?
What is the default spark executor memory?
What are the libraries of spark sql?
Is reduce-only job possible in Hadoop MapReduce?
List some commonly used Machine Learning Algorithm Apache Spark?
What is bloom filter?
What are the various functions of Spark Core?
What are the modules that constitute the Apache Hadoop 2.0 framework?
What is Rack Awareness? What is its need in Hadoop?