Answer Posted / Smit Agarwal
In PySpark, RDD (Resilient Distributed Datasets) are the basic building blocks of Spark. An RDD is an immutable distributed collection of data that can be computed from datasets in Hadoop storage systems like HDFS, or from local data system files. RDDs can be transformed and acted upon by various transformations (e.g., map, filter) and actions (e.g., count, save) provided by PySpark.
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers