Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What are the basic steps to writing a UDF Function in Pig?
Suppose there is file of size 514 mb stored in hdfs (hadoop 2.x) using default block size configuration and default replication factor. Then, how many blocks will be created in total and what will be the size of each block?
How is recovery achieved in Ambari?
What is the block size in Hadoop?
How many ways can you create rdd in spark?
Is there a module to implement sql in spark? How does it work?
What are the independent extensions that are contributed to the ambari codebase?
What is the procedure for namenode recovery?
How does impala achieve its performance improvements?
What is a bookie in bookkeeper?
What is HBase Shell?
What is the difference between an input split and hdfs block?
The partition of hive table has been modified to point to a new directory location. Do I have to move the data to the new location or the data will be moved automatically to the new location?
What is Sqoop Import? Explain its purpose?
Can you use spark to access and analyze data stored in cassandra databases?