Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Does mapreduce programming model provide a way for reducers to communicate with each other? In a mapreduce job can a reducer communicate with another reducer?
680How would you tackle calculating the number of unique visitors for each hour by mining a huge apache log? You can use post processing on the output of the mapreduce job.
826If reducers do not start before all mappers finish then why does the progress on mapreduce job shows something like map(50%) reduce(10%)? Why reducers progress percentage is displayed when mapper is not finished yet?
715
What is Yum?
What is the port number for NameNode
Can you join multiple fields in Apache
Which file systems does Spark support?
What is difference between hadoop and spark?
When the reducers are are started in a mapreduce job?
What is a Combiner?
Can you explain how to minimize data transfers while working with Spark?
Hadoop uses replication to achieve fault tolerance. How is this achieved in Apache Spark?
How will format the HDFS ?
What are the different CQL data definition commands in Cassandra?
How much Metadata will be created on NameNode in Hadoop?
What is the usage of "void close()" method?
Explain parquet file?
What is spooldir flume?