Can you explain benefits of spark over mapreduce?
Name three data source available in SparkSQL
Explain catalyst query optimizer in Apache Spark?
Can you run spark without hadoop?
In a very huge text file, you want to just check if a particular keyword exists. How would you do this using Spark?
What is rdd map?
What operations does rdd support?
What are the features and characteristics of Apache Spark?
What is difference between scala and spark?
Can you explain spark sql?
Describe different transformations in dstream in apache spark streaming?
What does dag stand for?
How will you calculate the number of executors required to do real-time processing using Apache Spark? What factors need to be considered for deciding on the number of nodes for real-time processing?
What are the types of cluster managers in spark?
What is the spark driver?