Which spark library allows reliable file sharing at memory speed across different cluster frameworks?
188Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
486
Why Mapper runs in heavy weight process and not in a thread in MapReduce?
If map reduce is inferior to spark then is there any benefit of learning it?
What do you mean by “data centre” in cassandra?
Define consistency?
Name the most common Input Formats defined in Hadoop? Which one is default?
Can spark work without hadoop?
What are the various input and output types supported by mapreduce?
What is compute and Storage nodes?
Which one will you decide for an undertaking – Hadoop MapReduce or Apache Spark?
Explain about the different types of transformations on DStreams?
Name different types of primary keys in Cassandra?
Explain sum(), max(), min() operation in Apache Spark?
Give the differences between the different types of primary keys in cassandra?
Specify what the information segments utilized by hadoop are?
What are 5 vs of big data ?