Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) What is bloom filter?
What is the use of ZooKeeper?
What are the components of a flume agent?
Can aluminum cause a spark?
Elaborate on CQL?
How kafka communicate with clients and servers?
What is the importance of — the split-by clause in running parallel import tasks in sqoop?
Is it possible to leverage real time analysis on the big data collected by flume directly? If yes, then explain how?
Replication causes data redundancy then why is pursued in hdfs?
Tell me about the types of hbase operations?
What is Disk Balancer in Hadoop?
Define parquet file format? How to convert data to parquet format?
What is impala?
What is the maximum recommended cell size?
Explain Reliability and Failure Handling in Apache Flume?