Explain the various Transformation on Apache Spark RDD like distinct(), union(), intersection(), and subtract()?
Answer / Padmabahadur Yadav
The following are the explanations of various transformations on Apache Spark's RDD:
1. distinct(): It is used to remove duplicate elements from an RDD. The resulting RDD will contain only unique elements.
2. union(): It combines all the given RDDs into a single RDD containing all elements from each input RDD. The order of elements in the output RDD may not be the same as any of the input RDDs.
3. intersection(): It returns an RDD that contains only the common elements present in both the given RDDs. If there are no common elements, it will return an empty RDD.
4. subtract(): It returns a RDD containing all elements from the first RDD that do not exist in the second RDD.
| Is This Answer Correct ? | 0 Yes | 0 No |
In how many ways RDDs can be created? Explain.
What are the roles of the file system in any framework?
Explain the use of File system API in Apache Spark
What is the biggest shortcoming of Spark?
What is sparksession and sparkcontext?
Explain different transformation on DStream?
Is there any benefit of learning MapReduce, then?
Explain Accumulator in Spark?
What are Actions?
What is external shuffle service in spark?
Explain the difference between Spark SQL and Hive.
Explain sum(), max(), min() operation in Apache Spark?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)