Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What is data pipeline in spark?
What are the port numbers of namenode, job tracker and task tracker?
In case of embedded Hive, can the same metastore be used by multiple users?
What is Hadoop serialization?
What is rdd map?
How does spark work with python?
How to create index statement in apache tajo?
What is difference between scala and spark?
What is the purpose of dfsadmin tool?
What do you understand by worker node?
What is map in apache spark?
Explain the process to trigger automatic clean-up in Spark to manage accumulated metadata.
How does a client read/write data in HDFS?
What is ttl (time to live) in hbase?
When does impala hold on to or return memory?