Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Does impala performance improve as it is deployed to more hosts in a cluster in much the same way that hadoop performance does?
164What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?
297What role does worker node play in Apache Spark Cluster? And what is the need to register a worker node with the driver program?
332
What is spark executor cores?
Whenever we run hive query, new metastore_db is created. Why?
What makes Apache Spark good at low-latency workloads like graph processing and machine learning?
What happens when the node running the map task fails before the map output has been sent to the reducer?
Is it possible to share data files between different components?
When Namenode is down what happens to job tracker?
How do you specify the table creator name when creating a table in hive?
Can you explain benefits of spark over mapreduce?
Characterize data integrity? How does hdfs ensure information integrity of data blocks squares kept in hdfs?
In MapReduce how to change the name of the output file from part-r-00000?
What are the features of apache mahout?
On What concept the Hadoop framework works?
What are the various types of shared variable in apache spark?
What is the use of tracing cqlsh command in Cassandra?
Discuss writeahead logging in Apache Spark Streaming?