Define RDD?
Answer / Jaideep Shrivastava
A Resilient Distributed Dataset (RDD) is an immutable distributed collection of data that Spark reads and writes in parallel. It's a fundamental data structure in Apache Spark, which can be created from Hadoop datasets, programming language collections, or custom data sources.
| Is This Answer Correct ? | 0 Yes | 0 No |
How can you compare Hadoop and Spark in terms of ease of use?
What is difference between map and flatmap?
What do you understand about yarn?
By Default, how many partitions are created in RDD in Apache Spark?
What is a "Spark Driver"?
What do you understand by SchemaRDD?
Is spark built on top of hadoop?
What is difference between hadoop and spark?
What is Map() operation in Apache Spark?
What is external shuffle service in spark?
Define "Action" in Spark
Why is spark so fast?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)