Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
Explain distnct(),union(),intersection() and substract() transformation in Spark?
Does impala support generic jdbc?
What is shuffle in spark?
Define consistency?
State the difference between persist() and cache() functions.
How does Cassandra delete data?
Which is the reliable channel in Flume to ensure that there is no data loss?
What is the maximum recommended cell size?
What is pig statistics?
Can there be no Reducer?
What is the difference between DAG and Lineage?
Why do we need a password-less ssh in fully distributed environment?
What are the bookkeeper elements and concepts?
Can multiple clients write into an HDFS file concurrently in hadoop?
What are the features of RDD, that makes RDD an important abstraction of Spark?