Big Data Interview Questions
Questions Answers Views Company eMail

What is catalyst query optimizer in apache spark?

195

What are the various types of shared variable in apache spark?

183

Define the common faults of the developer while using apache spark?

197

What is the use of spark driver, where it gets executed on the cluster?

213

What is speculative execution in spark?

235

Explain write ahead log(journaling) in spark?

186

Explain values() operation in apache spark?

270

Define the level of parallelism and its need in spark streaming?

232

Define sparksession in apache spark? Why is it needed?

198

Describe different transformations in dstream in apache spark streaming?

202

In hadoop_pid_dir, what does pid stands for?

238

What are the network requirements for hadoop?

252

What does hadoop-env.sh do?

244

Which are the three modes in which hadoop can be run?

249

Where is hadoop-env.sh file present?

234


Un-Answered Questions { Big Data }

What is spark databricks?

198


How to set which framework would be used to run mapreduce program?

402


How will you update the rows that are already exported?

5


How can you compare Hadoop and Spark in terms of ease of use?

194


What is the difference between spark and python?

190






Why ‘Reading‘ is done in parallel and ‘Writing‘ is not in HDFS?

22


Have you ever used counters in hadoop?

231


What do the master class and the output class do?

386


Explain what is jobtracker in hadoop? What are the actions followed by hadoop?

219


Explain JobConf in MapReduce.

515


What are the main methods of data transferring in hadoop sqoop?

5


What is network topology strategy?

48


Explain about the bloommapfile?

345


What is the difference between TextInputFormat and KeyValueInputFormat class?

227


What are barriers?

1