List out the various advantages of dataframe over rdd in apache spark?
What is map in apache spark?
Write the command to start and stop the spark in an interactive shell?
Define various running modes of apache spark?
What are the ways to run spark over hadoop?
What is catalyst query optimizer in apache spark?
What are the various types of shared variable in apache spark?
Define the common faults of the developer while using apache spark?
What is the use of spark driver, where it gets executed on the cluster?
What is speculative execution in spark?
Explain write ahead log(journaling) in spark?
Explain values() operation in apache spark?
Define the level of parallelism and its need in spark streaming?
Define sparksession in apache spark? Why is it needed?
Describe different transformations in dstream in apache spark streaming?