What is an rdd?
Answer / Vikas Saini
An RDD (Resilient Distributed Dataset) is a fundamental data structure in Apache Spark, used for distributed storage and computation. It is an immutable distributed collection of objects that can be processed in parallel across multiple nodes in a cluster.
| Is This Answer Correct ? | 0 Yes | 0 No |
Explain Spark streaming?
Explain the operation reduce() in Spark?
Who creates dag in spark?
What are the exact differences between reduce and fold operation in Spark?
Explain the top() and takeordered() operation?
What is external shuffle service in spark?
Does spark work with python 3?
explain the key features of Apache Spark?
What is executor memory in a spark application?
What apache spark is used for?
What is the difference between spark and hive?
Define "PageRank".
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)