Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
748How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
695Post New Apache Hadoop Questions
How do you define "block" in HDFS?
Define a metadata?
What is the purpose of RawComparator interface?
What are the characteristics of hadoop framework?
What is hbase in hadoop?
what should be the ideal replication factor in hadoop?
shouldn't DFS be able to handle large volumes of data already?
Which data storage components are used by hadoop?
What if a namenode has no data?
Explain use cases where SequenceFile class can be a good fit?
On what basis Namenode will decide which datanode to write on?
Can the balancer be run while Hadoop is in use?
How to enable trash/recycle bin in hadoop?
What is the use of Combiner?
Explain the difference between gen1 and gen2 hadoop with regards to the namenode?