What is the difference between cache and persist in spark?
Answer Posted / Ravita Rani
"Cache and Persist() are used to keep RDDs (Resilient Distributed Datasets) in memory for faster access. However, there is a key difference between them. Cache keeps an RDD in memory of the same JVM that created it. If the RDD is created on a different Executor, it won't be cached unless you use Persist(). The Persist() function allows you to specify whether the data should be kept only in memory (MEMORY), on disk and memory (MEMORY_ONLY_SER) or off-heap memory (OFFHEAP).".
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers