Why do we use HDFS for applications having large data sets and not when there are lot of small files?
1 1785
Does mapreduce programming model provide a way for reducers to communicate with each other? In a mapreduce job can a reducer communicate with another reducer?
What is difference between coalesce and repartition?
Explain the Constituents of Apache ZooKeeper Architecture?
How many numbers of reducers run in Map-Reduce Job?
Why should we use ‘distinct’ keyword in Pig scripts?
Can you define yarn?
Give me the examples of Columnar database ?
Are spark dataframes distributed?
What is Immutable?
Give examples of the SerDe classes whihc hive uses to Serializa and Deserilize data?
Can impala be used for complex event processing?
Explain about the different types of trformations on dstreams?
What is salting in spark?
What is a rack awareness algorithm and why is it used in hadoop?
Elaborate on Identifiers?