What is RDD?
Answer / Bhaskar Shukla
RDD (Resilient Distributed Dataset) is a distributed collection of data that can be operated on in parallel across a cluster. It is the fundamental data structure in Apache Spark, which can be created from various sources like HDFS files, text files, or other RDDs.
| Is This Answer Correct ? | 0 Yes | 0 No |
Who uses apache spark?
What is spark written?
List the various types of "Cluster Managers" in Spark.
Why do we need apache spark?
Is apache spark a database?
Explain fold() operation in spark?
What is the need for Spark DAG?
Explain pipe() operation in Apache Spark?
What is dataframe api?
Why should I use spark?
What is spark vcores?
Why spark is used?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)