Big Data Interview Questions
Questions Answers Views Company eMail

What are shared variables in spark?

209

What is the future of apache spark?

189

How can I improve my spark performance?

188

What is apache spark architecture?

212

Why spark is faster than hive?

184

What happens if rdd partition is lost due to worker node failure?

298

What is pair rdd in spark?

196

What is difference between cache and persist in spark?

192

Is bigger than spark driver maxresultsize?

215

Does spark use java?

197

How do you process big data with spark?

177

What is a spark shuffle?

206

Why do we need apache spark?

191

How do I optimize my spark code?

197

What is the difference between client mode and cluster mode in spark?

203


Un-Answered Questions { Big Data }

What is spark slang for?

196


What is the difference between python and spark?

165


What are the major features/characteristics of rdd (resilient distributed datasets)?

216


List down the segments of a hive question processor?

363


What are the various InputFormats in Hadoop?

368






How does bloom filter help in searching rows?

143


How would you tackle calculating the number of unique visitors for each hour by mining a huge apache log? You can use post processing on the output of the mapreduce job.

371


Which command is available to show the current HBase user?

114


Is it possible to do an incremental import using Sqoop?

5


When to choose "External Table" in Hive?

393


How to resolve IOException: Cannot create directory

667


Is kafka a amqp?

288


How does hdfs ensure information integrity of data blocks squares kept in hdfs?

17


Enlist all Apache Kafka Operations?

307


Where sorting is done on mapper node or reducer node in MapReduce?

394