Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What do you use spark for?
What are the main hdfs-site.xml properties?
What is spark databricks?
Write a Hive UDF that returns a sentiment score. For example, if good = 1, bad = -1, and average = 0, then a review of a restaurant states "Good food, bad service," your score might be 1 - 1 = 0.
How often do you need to reformat the namenode?
What are the modes in which Hadoop run?
What mechanism does hadoop framework provides to synchronize changes made in distribution cache during runtime of the application?
Are job tracker and task trackers present in separate machines?
What is catalyst query optimizer in apache spark?
How can you make sure of logical grouping of cells in the hbase?
What is the difference between HDFS block and input split?
Comparison between Secondary NameNode and Checkpoint Node in Hadoop?
Explain what is the row key?
What is Spark Dataset?
What are combiners and its purpose?