List commonly used machine learning algorithm?
Explain the filter transformation?
what do you mean by the worker node?
What is rdd lineage graph? How is it useful in achieving fault tolerance?
Explain about trformations and actions in the context of rdds?
What is the key difference between textfile and wholetextfile method?
What do you understand by the parquet file?
If there is certain data that we want to use again and again in different transformations, what should improve the performance?
Explain partitions?
Explain api create or replace tempview()?
Define parquet file format? How to convert data to parquet format?
Explain mappartitions() and mappartitionswithindex()?
Explain pipe() operation. How it writes the result to the standard output?
Explain transformation in rdd. How is lazy evaluation helpful in reducing the complexity of the system?
How to identify that given operation is transformation/action in your program?