Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What is Shuffling and Sorting in a MapReduce?
What are the data formats supported by apache tajo?
Explain various Apache Spark ecosystem components. In which scenarios can we use these components?
What is session in Cassandra?
what is the typical block size of an HDFS block?
What is HDFS block size and what did you chose in your project?
Explain HDFS “Write once Read many” pattern?
What are the relational operators available related to combining and splitting in pig language?
What is jmx connector?
What do you understand by Data Replication in Cassandra?
What is rack-aware replica placement policy?
What do you mean by logging in cassandra?
Why HDFS performs replication, although it results in data redundancy in Hadoop?
Define commit log?
Are results returned as they become available, or all at once when a query completes?