Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Does impala performance improve as it is deployed to more hosts in a cluster in much the same way that hadoop performance does?
164What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?
320What role does worker node play in Apache Spark Cluster? And what is the need to register a worker node with the driver program?
346
What are the capabilities of kafka?
What is a pipelinedrdd?
Mention what job does the conf class do?
When should you use sequencefileinputformat?
How can we kill a topology?
What are the main components of a Hadoop Application?
Name commonly-used Spark Ecosystems
What are the disadvantages of using Spark?
State some Ambari components which we can use for automation as well as integration?
What are Paired RDD?
What are the actions in spark?
Can you explain spark streaming?
why use hcolumndescriptor class?
What is an input reader in reference to mapreduce?
Is spark used for machine learning?