Answer Posted / Hariom Akash Sahay
In Apache Spark, a partition refers to a logical subset of data within a Resilient Distributed Dataset (RDD). Each RDD partition is stored on one or more worker nodes. A Partitioner is responsible for determining how the data is distributed across partitions. By default, Spark uses a HashPartitioner, which evenly distributes data based on a hash function.
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers