How is RDD in Spark different from Distributed Storage Management?
Answer Posted / Anurag Singh Chauhan
RDD (Resilient Distributed Dataset) in Apache Spark is a distributed collection of objects, while Distributed Storage Management refers to the process of managing and organizing data across multiple nodes in a distributed computing environment. RDDs are an important abstraction provided by Spark for performing distributed computations, but they do not directly manage data storage. Instead, RDDs can be persisted on various storage systems such as HDFS, S3, Cassandra etc., or maintained in-memory for faster access. However, the choice of storage is separate from the creation and manipulation of RDDs.
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers