Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What is the default replication factor and how will you change it?
Does hadoop follows the unix pattern?
Did you ever ran into a lop sided job that resulted in out of memory error, if yes then how did you handled it ?
What is the difference between spark ml and spark mllib?
When executing Hive queries in different directories, why is metastore_db created in all places from where Hive is launched?
Explain the hdfs architecture and list the various hdfs daemons in hdfs cluster?
Explain how you can improve the throughput of a remote consumer?
What do you mean by ss table?
What do you mean by replication factor?
Can we set the number of reducers to zero in MapReduce?
What are the actions in spark?
If you run a select * query in hive, why does it not run mapreduce?
Clarify how hive de-serialize and serialize the information?
Any two Limitations of Flume?
What is simple strategy?