Big Data Interview Questions
Questions Answers Views Company eMail

By Default, how many partitions are created in RDD in Apache Spark?

208

What are broadcast variables in Apache Spark? Why do we need them?

194

Is it necessary to start Hadoop to run any Apache Spark Application ?

197

What is write ahead log(journaling)?

210

Does Apache Spark provide checkpoints?

191

What is Apache Spark Machine learning library?

211

What is the use of map transformation?

209

Explain the run-time architecture of Spark?

205

List the advantage of Parquet files?

184

Name the Spark Library which allows reliable file sharing at memory speed across different cluster frameworks.

209

Please provide an explanation on DStream in Spark.

196

List the languages supported by Apache Spark?

196

Explain the Parquet File format in Apache Spark. When is it the best to choose this?

240

Explain lineage graph

212

Is the following approach correct? Is the sqrt Of Sum Of Sq a valid reducer?

241


Un-Answered Questions { Big Data }

What is project tungsten in spark?

201


Detail description of the Reducer phases?

578


what does /*streamtable(table_name)*/ do?

475


What is the use of illustrate in pig?

288


Where are hadoop’s configuration files located and list them?

222






What is meant by streaming access?

282


What is Grunt shell?

330


What is sc parallelize?

209


what is Metastore in Hive?

400


What is Hadoop Custom partitioner ?

724


List out some common problems faced by data analyst?

244


Define RDD?

228


Can you define parquet file?

187


Define streaming?

375


Where does Big Data come from?

294