Explain Creating an Index?
List the various HDFS daemons in HDFS cluster?
What is the use of ZooKeeper?
In a given spark program, how will you identify whether a given operation is Transformation or Action ?
Do you need to install spark on all nodes of yarn cluster?
Why not just use zookeeper for everything?
What is the next step after Mapper or MapTask?
How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
Clarify how job tracker schedules an assignment?
How rdd can be created in spark?
How is it different from doing machine learning in r or sas?
Which companies are mostly using Hive ?
What do you understand by node in cassandra?
Explain the top() and takeordered() operation?
How does NameNode tackle DataNode failures?