When to use coalesce and repartition in spark?
Answer / Shakher Chaudhary
Use Coalesce when you want to reduce the number of partitions, especially before writing data to an output format like CSV or Parquet. Use Repartition when you need to change the number of partitions for a specific reason, such as joining data where the join key is evenly distributed across fewer partitions.
| Is This Answer Correct ? | 0 Yes | 0 No |
How do I optimize my spark code?
What is skew data?
Is it possible to run Spark and Mesos along with Hadoop?
what do you mean by the worker node?
Explain Spark Core?
What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?
What are the actions in spark?
What are the various storages from which Spark can read data?
How is dag created in spark?
What are the transformations in spark?
What is difference between rdd and dataframe?
Define Actions.
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)