Big Data Interview Questions
Questions Answers Views Company eMail

What are shared variables in spark?

212

What is the future of apache spark?

193

How can I improve my spark performance?

188

What is apache spark architecture?

216

Why spark is faster than hive?

188

What happens if rdd partition is lost due to worker node failure?

306

What is pair rdd in spark?

200

What is difference between cache and persist in spark?

192

Is bigger than spark driver maxresultsize?

216

Does spark use java?

201

How do you process big data with spark?

179

What is a spark shuffle?

210

Why do we need apache spark?

191

How do I optimize my spark code?

199

What is the difference between client mode and cluster mode in spark?

205


Un-Answered Questions { Big Data }

What happens to existing data in my cluster when I add new nodes?

121


Explain the term HCatalog?

5


What is the difference between HDFS block and input split?

465


Describe HDFS Federation?

26


What is difference between hive and spark?

191






Where are rdd stored?

196


What is the difference between cassandra, hadoop big data, mongodb, couchdb?

255


Why does my select statement fail?

41


Which database the sqoop metastore runs on?

5


What is Replication Factor in Cassandra?

48


What is HDFS?

26


What is Shuffling and Sorting in a MapReduce?

410


Explain Apache Ambari?

41


What is a "Parquet" in Spark?

209


When a large data set is maintained?

423