Big Data Interview Questions
Questions Answers Views Company eMail

Why is BlinkDB used?

218

What is the advantage of a Parquet file?

215

What are the key features of Apache Spark that you like?

259

What do you understand by SchemaRDD?

221

How can you achieve high availability in Apache Spark?

283

Define a worker node?

239

Name a few companies that use Apache Spark in production?

249

What is the difference between persist() and cache()?

221

Which spark library allows reliable file sharing at memory speed across different cluster frameworks?

188

What does the Spark Engine do?

212

How Spark uses Akka?

217

How Spark handles monitoring and logging in Standalone mode?

217

What is Hadoop serialization?

Capital One,

401

Explain a simple Map/Reduce problem.

Capital One,

443

Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?

Twitter,

486


Un-Answered Questions { Big Data }

Why Mapper runs in heavy weight process and not in a thread in MapReduce?

378


If map reduce is inferior to spark then is there any benefit of learning it?

182


What do you mean by “data centre” in cassandra?

76


Define consistency?

92


Name the most common Input Formats defined in Hadoop? Which one is default?

245






Can spark work without hadoop?

194


What are the various input and output types supported by mapreduce?

368


What is compute and Storage nodes?

706


Which one will you decide for an undertaking – Hadoop MapReduce or Apache Spark?

358


Explain about the different types of transformations on DStreams?

227


Name different types of primary keys in Cassandra?

52


Explain sum(), max(), min() operation in Apache Spark?

208


Give the differences between the different types of primary keys in cassandra?

63


Specify what the information segments utilized by hadoop are?

236


What are 5 vs of big data ?

205