Explain distnct(),union(),intersection() and substract() transformation in Spark

Golgappa.net | Golgappa.org | BagIndia.net | BodyIndia.Com | CabIndia.net | CarsBikes.net | CarsBikes.org | CashIndia.net | ConsumerIndia.net | CookingIndia.net | DataIndia.net | DealIndia.net | EmailIndia.net | FirstTablet.com | FirstTourist.com | ForsaleIndia.net | IndiaBody.Com | IndiaCab.net | IndiaCash.net | IndiaModel.net | KidForum.net | OfficeIndia.net | PaysIndia.com | RestaurantIndia.net | RestaurantsIndia.net | SaleForum.net | SellForum.net | SoldIndia.com | StarIndia.net | TomatoCab.com | TomatoCabs.com | TownIndia.com
Interested to Buy Any Domain ? << Click Here >> for more details...

Explain distnct(),union(),intersection() and substract() transformation in Spark?

Question Posted / sudhir yadav

1 Answers
337 Views
I also Faced
E-Mail Answers

Explain distnct(),union(),intersection() and substract() transformation in Spark?..

Answer / Shalvendra Payal

"distinct()": Removes duplicate records from a DataFrame or Dataset. This function is often used when you need to work with unique values.nn"union()": Combines two or more DataFrames or Datasets into one single DataFrame or Dataset. It returns all the rows of both DataFrames and eliminates any duplicate records based on their order in the original DataFrames.nn"intersection()": Returns a new DataFrame that contains only the common rows between two DataFrames. This function is case-sensitive and performs an equi-join by default, meaning it only returns rows where columns have exact matches.nn"subtract()": Returns a new DataFrame with all the rows from the first input DataFrame, but excludes any rows that are present in both the first and second input DataFrames. The resultant DataFrame will not include duplicate rows, even if they exist in the first DataFrame."

Is This Answer Correct ?

0 Yes

0 No

Post New Answer

More Apache Spark Interview Questions

What is serialization in spark?

Where is apache spark used?

What is Starvation scenario in spark streaming?

What is hadoop spark?

What is driver and executor in spark?

What is hadoop technology?

How can I speed up my spark?

Define sparkcontext in apache spark?

Is apache spark part of hadoop?

How does apache spark work?

What is difference between spark and hadoop?

What is difference between dataset and dataframe?

For more Apache Spark Interview Questions Click Here

Categories

Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)