What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?
Answer / Neeraj Kumar Soni
RDD (Resilient Distributed Datasets) in Apache Spark is an immutable distributed collection of data that can be manipulated using transformations and actions. RDDs are computed by splitting large datasets into smaller chunks called partitions, each residing on a single node. RDDs can be created from various sources such as Hadoop files (textFile), local files (textFile("localfile")), or even other RDDs.
| Is This Answer Correct ? | 0 Yes | 0 No |
What is a tuple in spark?
What is dataproc cluster?
Which is the best spark certification?
Can you define rdd lineage?
What is spark lineage?
What are broadcast variables in spark?
Is apache spark going to replace hadoop?
How is Apache Spark better than Hadoop?
What is the difference between dataframe and dataset in spark?
How spark works on hadoop?
Compare Transformation and Action in Apache Spark?
What does MLlib do?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)