"Cache and Persist() are used to keep RDDs (Resilient Distributed Data

What is the difference between cache and persist in spark?

Question Posted / Ravita Rani

1 Answers
355 Views
I also Faced
E-Mail Answers

Answer Posted / Ravita Rani

"Cache and Persist() are used to keep RDDs (Resilient Distributed Datasets) in memory for faster access. However, there is a key difference between them. Cache keeps an RDD in memory of the same JVM that created it. If the RDD is created on a different Executor, it won't be cached unless you use Persist(). The Persist() function allows you to specify whether the data should be kept only in memory (MEMORY), on disk and memory (MEMORY_ONLY_SER) or off-heap memory (OFFHEAP).".

Is This Answer Correct ?

0 Yes

0 No

Post New Answer View All Answers

Please Help Members By Posting Answers For Below Questions

List the advantage of Parquet file in Apache Spark?

473

What is the latest version of spark?

287

Explain how RDDs work with Scala in Spark

355

What is meant by Transformation? Give some examples.

328