How is fault tolerance achieved in Apache Spark?
Answer / Isha Vishnoi
Apache Spark achieves fault tolerance through resilient distributed datasets (RDDs). RDDs are immutable distributed collections of data partitioned across nodes in a cluster. When an action is triggered on an RDD, the system checks the status of each node to ensure that all tasks have completed successfully. If any task fails, Spark will automatically re-execute the failed task on another available node, ensuring that the final result remains consistent.
| Is This Answer Correct ? | 0 Yes | 0 No |
Can you mention some features of spark?
When creating an RDD, what goes on internally?
What is SparkSession in Apache Spark? Why is it needed?
How can you launch Spark jobs inside Hadoop MapReduce?
What is paired rdd in spark?
What is map side join?
What is shuffle read and shuffle write in spark?
What is executor memory in a spark application?
Explain the run-time architecture of Spark?
What is a spark shuffle?
Can you explain spark streaming?
Is spark a language?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)