Explain distnct(),union(),intersection() and substract() transformation in Spark?
Answer Posted / Shalvendra Payal
"distinct()": Removes duplicate records from a DataFrame or Dataset. This function is often used when you need to work with unique values.nn"union()": Combines two or more DataFrames or Datasets into one single DataFrame or Dataset. It returns all the rows of both DataFrames and eliminates any duplicate records based on their order in the original DataFrames.nn"intersection()": Returns a new DataFrame that contains only the common rows between two DataFrames. This function is case-sensitive and performs an equi-join by default, meaning it only returns rows where columns have exact matches.nn"subtract()": Returns a new DataFrame with all the rows from the first input DataFrame, but excludes any rows that are present in both the first and second input DataFrames. The resultant DataFrame will not include duplicate rows, even if they exist in the first DataFrame."
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers