Un-Answered Questions { Big Data }

What happens when the data set exceeds available memory?

85


How much memory is required?

42


Are results returned as they become available, or all at once when a query completes?

52


Why do I have to use refresh and invalidate metadata, what do they do?

31


Why does my select statement fail?

41


Does impala performance improve as it is deployed to more hosts in a cluster in much the same way that hadoop performance does?

96


What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?

221


What role does worker node play in Apache Spark Cluster? And what is the need to register a worker node with the driver program?

224


What is SparkSession in Apache Spark? Why is it needed?

219


What is the task of Spark Engine

232


What is the user of sparkContext?

225


How is the processing of streaming data achieved in Apache Spark? Explain.

193


Can you do real-time processing with Spark SQL?

195


Discuss the role of Spark driver in Spark application?

200


What are the features of RDD, that makes RDD an important abstraction of Spark?

192