What is difference between cache and persist in spark?
Answer / Rajeshwar Prasad
Cache and Persist are methods used for data caching in Apache Spark, but they behave slightly differently. Cache discards data after a certain period of time unless it's being actively used or persisted again. Persist, on the other hand, allows you to specify levels of persistence (e.g., MEMORY_ONLY, MEMORY_AND_DISK, etc.) and keeps the data in memory or on disk based on your preference.
| Is This Answer Correct ? | 0 Yes | 0 No |
Explain the key features of Spark.
When to use coalesce and repartition in spark?
What is sc parallelize?
How to create an rdd?
Can you explain spark mllib?
What is the key difference between textfile and wholetextfile method?
Name commonly-used Spark Ecosystems
Explain first() operation in Apache Spark RDD?
In a very huge text file, you want to just check if a particular keyword exists. How would you do this using Spark?
Is apache spark a framework?
What are the disadvantages of using Spark?
What is RDD lineage graph? How does it enable fault-tolerance in Spark?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)