Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What should be the HDFS Block size to get maximum performance from Hadoop cluster?
Explain what is sqoop in Hadoop ?
How NameNode tackle Datanode failures in Hadoop?
How can you start a consumer in kafka?
How do you handle compression in pig?
What combiners is and when you should use a combiner in a MapReduce Job?
Why is HDFS only suitable for large data sets and not the correct tool to use for many small files?
Why do we need spark?
What is dataframe api?
Explain Cqlsh?
What is spark driver application?
What is meant by in-memory processing in Spark?
What is a bag in pig?
Does spark use yarn?
What is your favourite tool in the hadoop ecosystem?