How does spark rdd work?
Answer / Avneesh Kumar Maurya
RDD (Resilient Distributed Dataset) is the fundamental data structure in Apache Spark. RDDs are immutable distributed collections of objects that can be processed in parallel. They are fault-tolerant, meaning if a worker node fails, the computation can be restarted on another node without losing any data.
| Is This Answer Correct ? | 0 Yes | 0 No |
What are the various modes in which Spark runs on YARN? (Local vs Client vs Cluster Mode)
Who created spark?
Who is the founder of spark?
What are shared variables in spark?
In what ways sparksession different from sparkcontext?
How do you parse data in xml? Which kind of class do you use with java to pass data?
What is spark job?
Hadoop uses replication to achieve fault tolerance. How is this achieved in Apache Spark?
Does spark load all data in memory?
Explain Spark coalesce() operation?
Describe Partition and Partitioner in Apache Spark?
What do you know about schemardd?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)