Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What is the role of the kafka producer api.
What is KeyValueTextInputFormat in Hadoop?
What are the differences between relational databases and impala?
Write a query to insert a new column(new_col int) into a hiev table (htab) at a position before an existing column (x_col)
What load do concurrent queries produce on the namenode?
What is output format in hadoop?
Why should we use ‘orderby’ keyword in pig scripts?
Define the term ‘sparse vector.’
What is spark parallelize?
Why Hadoop performs replication, although it results in data redundancy?
List out some key features of apache cassandra?
What are the features and characteristics of Apache Spark?
How to create a custom key and custom value in MapReduce Job?
How can native libraries be included in yarn jobs?
Define catalog tables in HBase?