Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
State use cases of impala?
How do I load a big csv file into a partitioned table?
Name the operating system(s) which are supported for production hadoop deployment?
Explain the role of offset in kafka?
What is a TaskInstance?
What are the two main components of ResourceManager?
Explain Erasure Coding in Hadoop?
What is cluster in Cassandra data model?
Tell any two feature Flume?
What are combiners? When should I use a combiner in my MapReduce Job?
Tell any two features of flume?
How will you calculate the number of executors required to do real-time processing using Apache Spark? What factors need to be considered for deciding on the number of nodes for real-time processing?
How hadoop mapreduce works?
How does impala process join queries for large tables?
Does google use spark?