How might you limit information moves when working with Spark?
Answer Posted / Kavita Bhasker
To limit data movements in Spark, you can employ various strategies like: 1) Caching the intermediate results (using cache() and persist()) to reuse them without recomputing. 2) Using repartition() or coalesce() judiciously to balance data across nodes. 3) Using broadcast variables for sharing large datasets across many tasks.
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers