Explain distnct(),union(),intersection() and substract() transformation in Spark?
Answer / Shalvendra Payal
"distinct()": Removes duplicate records from a DataFrame or Dataset. This function is often used when you need to work with unique values.nn"union()": Combines two or more DataFrames or Datasets into one single DataFrame or Dataset. It returns all the rows of both DataFrames and eliminates any duplicate records based on their order in the original DataFrames.nn"intersection()": Returns a new DataFrame that contains only the common rows between two DataFrames. This function is case-sensitive and performs an equi-join by default, meaning it only returns rows where columns have exact matches.nn"subtract()": Returns a new DataFrame with all the rows from the first input DataFrame, but excludes any rows that are present in both the first and second input DataFrames. The resultant DataFrame will not include duplicate rows, even if they exist in the first DataFrame."
| Is This Answer Correct ? | 0 Yes | 0 No |
What is serialization in spark?
Where is apache spark used?
What is Starvation scenario in spark streaming?
What is hadoop spark?
What is driver and executor in spark?
What is hadoop technology?
How can I speed up my spark?
Define sparkcontext in apache spark?
Is apache spark part of hadoop?
How does apache spark work?
What is difference between spark and hadoop?
What is difference between dataset and dataframe?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)