Big Data Interview Questions
Questions Answers Views Company eMail

What is catalyst query optimizer in apache spark?

199

What are the various types of shared variable in apache spark?

185

Define the common faults of the developer while using apache spark?

203

What is the use of spark driver, where it gets executed on the cluster?

213

What is speculative execution in spark?

235

Explain write ahead log(journaling) in spark?

188

Explain values() operation in apache spark?

272

Define the level of parallelism and its need in spark streaming?

234

Define sparksession in apache spark? Why is it needed?

198

Describe different transformations in dstream in apache spark streaming?

204

In hadoop_pid_dir, what does pid stands for?

244

What are the network requirements for hadoop?

256

What does hadoop-env.sh do?

248

Which are the three modes in which hadoop can be run?

253

Where is hadoop-env.sh file present?

238


Un-Answered Questions { Big Data }

What is Hive Data Definition language?

594


Is hbase an os independent approach?

127


When to use secondary indexes?

50


What are the differences between a node, a cluster, and datacenter in Cassandra?

61


What is the difference between persist() and cache()?

221






Explain Clustering in Hive?

417


What is the role of Zookeeper in HBase architecture?

558


What is a spark rdd?

223


Are pig scripts support distributed file systems?

297


what job does the conf class do?

497


Can the name of a view be same as the name of a hive table?

456


In Hadoop what is InputSplit?

382


What type of data we should put in distributed cache? When to put the data in dc? How much volume we should put in?

249


What is a bloom filter?

121


Why comparison of types is important for MapReduce?

680