Hadoop (4218)
Big Data General (104)
Big Data AllOther (3) Which spark library allows reliable file sharing at memory speed across different cluster frameworks?
271Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
748
What purpose would an engineer use spark?
What is the use of apache flume?
What are the components of presto architecture?
How can we scale apache mahout in cloud?
What is the stable version of Hive ?
What is cloudera and why it is used?
What do you understand by cql?
when do reducers play their role in a mapreduce task?
What is the purpose of Sqoop List Tables?
Explain the main difference between kafka and flume?
How many Mappers run for a MapReduce job?
What is identity mapper and chain mapper?
What is the difference between kafka and mq?
How do we create rdds in spark?
What is the maximum size of string data type supported by hive? Mention the hive support binary formats.