Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What is the process of changing the split size if there is limited storage space on Commodity Hardware?
Why do we perform partitioning in Hive?
Hbase blocksize is configured on which level?
Mention how many operational commands in hbase?
What is the history of apache mahout? Once did it start?
How do I stop flume agent?
What happen if one of the datanodes has much slower cpu?
When to use Hive?
In MapReduce, ideally how many mappers should be configured on a slave?
Why Apache Spark?
What is the Reducer used for?
How would you tackle calculating the number of unique visitors for each hour by mining a huge apache log? You can use post processing on the output of the mapreduce job.
Input Split & Record Reader and what they do?
What are the abstractions of Apache Spark?
What is DStream in Apache Spark Streaming?