Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Does mapreduce programming model provide a way for reducers to communicate with each other? In a mapreduce job can a reducer communicate with another reducer?
721How would you tackle calculating the number of unique visitors for each hour by mining a huge apache log? You can use post processing on the output of the mapreduce job.
895If reducers do not start before all mappers finish then why does the progress on mapreduce job shows something like map(50%) reduce(10%)? Why reducers progress percentage is displayed when mapper is not finished yet?
759
What is the throughput?
How many layers of Hadoop components are supported by Apache Ambari and what are they?
List some benefits of apache kafka?
Can multiple clients write into a Hadoop HDFS file concurrently?
Explain how do you overwrite replication factor?
Explain bagtostring in pig?
What is the primary purpose of flume in the hadoop architecture?
What is Hive ?
What do you mean by metadata in HDFS? Where is it stored in Hadoop?
Explain partitions?
What do you understand by Consistency in Cassandra?
Explain HCatalog Create Table CLI along with its syntax?
Give some points of hive for hadoop ?
Web-ui shows that half of the datanodes are in decommissioning mode. What does that mean? Is it safe to remove those nodes from the network?
How is it different from doing machine learning in r or sas?