What is the difference between coalesce and repartition in spark?
Answer / Prem Shankar Jha
Coalesce and Repartition are operations used to change the number of partitions of a DataFrame or RDD. Repartition re-distributes data across new partitions, while Coalesce consolidates existing partitions (and potentially reducing the overall partition count) by combining smaller ones into larger ones. The main difference is that Repartition may shuffle the data more than Coalesce as it needs to redistribute the data among different partitions.
| Is This Answer Correct ? | 0 Yes | 0 No |
Does spark sql use hive?
What are the drawbacks of Apache Spark?
Explain about mappartitions() and mappartitionswithindex()
What is standalone mode in spark?
What are the benefits of lazy evaluation?
What is the standalone mode in spark cluster?
What is action, how it process data in apache spark
What are the limitations of Spark?
What is lambda in spark?
What is the difference between rdd and dataframe?
What is partitioner spark?
How do you parse data in xml? Which kind of class do you use with java to pass data?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)