If there is certain data that we want to use again and again in different transformations, what should improve the performance?
Answer / Mahima Singh
"Caching the DataFrame or DataSet can significantly improve the performance as it stores the data in memory for reuse across multiple actions. Another approach could be using persist() method with MEMORY_ONLY, MEMORY_ONLY_SER, or MEMORY_AND_DISK storage levels depending on the required level of persistence and memory footprint."
| Is This Answer Correct ? | 0 Yes | 0 No |
What is client mode in spark?
What are the ways to run spark over hadoop?
Can you define pagerank?
Different Running Modes of Apache Spark
Describe the run-time architecture of Spark?
Can you explain accumulators in apache spark?
How does spark rdd work?
What is shuffle in spark?
What is a dstream in apache spark?
What is the biggest shortcoming of Spark?
How is rdd fault?
Explain briefly what is Action in Apache Spark? How is final result generated using an action?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)