Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
How hive can improve performance with orc format tables?
What is the use of rdd in spark?
Why aggregation cannot be done in Mapper?
What is the logistic regression?
What is apache spark sql?
What is accumulators and broadcast variables in spark?
What is a Secondary Namenode? Is it a substitute to the Namenode?
Why are the number of splits equal to the number of maps?
Explain the difference between an hdfs block and input split?
What a task tracker is in hadoop?
Give some important features of SPM?
Is there another way to check whether Namenode is working?
How does impala process join queries for large tables?
What is Mapper? How can we compress Mapper output in Hadoop?
What are the important modes of hadoop?