Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
What is a rack?
What is throughput? How does HDFS provide good throughput?
What do you understand by Transformations in Spark?
Define parquet file format? How to convert data to parquet format?
What is spark application?
What is flume and sqoop?
What is a map side join?
What is spark in python?
what is the traditional method of message trfer?
Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this?
Explain textloader function?
Explain a scenario where you will be using spark streaming.
What is your favourite tool in the hadoop ecosystem?
What do sorting and shuffling do?
How do I start flume in hadoop?