What are the features of RDD, that makes RDD an important abstraction of Spark?
Answer / Neelam
RDD (Resilient Distributed Dataset) is a fundamental data structure in Apache Spark. Its key features include: (1) Immutable: Once created, an RDD cannot be modified; instead, new RDDs are created from existing ones. (2) Distribute and partitioned: Data in RDDs are automatically distributed across nodes in a cluster for parallel processing. (3) Fault-tolerant: Spark stores multiple copies of each partition on different nodes to ensure fault tolerance. When a failure occurs, the lost data can be recovered from other copies. (4) Rich API: RDD provides a rich set of transformation and action operations that are easy to use and extend.
| Is This Answer Correct ? | 0 Yes | 0 No |
What is a "worker node"?
Does spark use yarn?
What does apache spark stand for?
What is RDD?
What are the components of spark?
What is a pipelinedrdd?
What is DStream in Apache Spark Streaming?
What do you understand by receivers in Spark Streaming ?
What is row rdd in spark?
What is executor spark?
Explain the use of broadcast variables
What is cluster mode in spark?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)