Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
How to exit the vi editor?
Does the hdfs client decide the input split or namenode?
Which files are used by the startup and shutdown commands?
What is cloudera and why it is used?
What is a spill factor with respect to the ram?
Can we have multiple entries in the master files?
On which port does ssh work?
Do we need to give a password, even if the key is added in ssh?
What are the port numbers of namenode, job tracker and task tracker?
Is fs.mapr.working.dir a single directory?
How can we look for the namenode in the browser?
What do slaves consist of?
How can we check whether namenode is working or not?