What is the difference between coalesce and repartition in spark?
Answer Posted / Prem Shankar Jha
Coalesce and Repartition are operations used to change the number of partitions of a DataFrame or RDD. Repartition re-distributes data across new partitions, while Coalesce consolidates existing partitions (and potentially reducing the overall partition count) by combining smaller ones into larger ones. The main difference is that Repartition may shuffle the data more than Coalesce as it needs to redistribute the data among different partitions.
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers