"distinct()": Removes duplicate records from a DataFrame or Datas

Explain distnct(),union(),intersection() and substract() transformation in Spark?

Question Posted / Shalvendra Payal

1 Answers
338 Views
I also Faced
E-Mail Answers

Answer Posted / Shalvendra Payal

"distinct()": Removes duplicate records from a DataFrame or Dataset. This function is often used when you need to work with unique values.nn"union()": Combines two or more DataFrames or Datasets into one single DataFrame or Dataset. It returns all the rows of both DataFrames and eliminates any duplicate records based on their order in the original DataFrames.nn"intersection()": Returns a new DataFrame that contains only the common rows between two DataFrames. This function is case-sensitive and performs an equi-join by default, meaning it only returns rows where columns have exact matches.nn"subtract()": Returns a new DataFrame with all the rows from the first input DataFrame, but excludes any rows that are present in both the first and second input DataFrames. The resultant DataFrame will not include duplicate rows, even if they exist in the first DataFrame."

Is This Answer Correct ?

0 Yes

0 No

Post New Answer View All Answers

Please Help Members By Posting Answers For Below Questions

List the advantage of Parquet file in Apache Spark?

474

What is the latest version of spark?

288

Explain how RDDs work with Scala in Spark

355

What is meant by Transformation? Give some examples.

328