Explain how do ‘map’ and ‘reduce’ work?
What is an identity mapper and identity reducer?
What is streaming?
In Map Reduce why map write output to Local Disk instead of HDFS?
What is KeyValueTextInputFormat in Hadoop MapReduce?
What is Data Locality in Hadoop?
What do you understand by compute and storage nodes?
What is the relationship between Job and Task in Hadoop?
What do sorting and shuffling do?
what is the Hadoop MapReduce APIs contract for a key and value class?
What is the input type/format in MapReduce by default?
Which among the two is preferable for the project- Hadoop MapReduce or Apache Spark?
What is the Hadoop MapReduce API contract for a key and value Class?
Explain what does the conf.setMapper Class do in MapReduce?
Where is Mapper output stored?