Answer Posted / Saurabh Mishra
In Apache Spark, a lineage graph is a data structure that keeps track of the history of each dataset within a Spark application. It records every transformation operation (like map, filter, join) applied to a dataset and stores its input and output datasets. This information allows Spark to perform optimizations, like caching frequently used datasets or recomputing missing transformations when needed.
| Is This Answer Correct ? | 0 Yes | 0 No |
Post New Answer View All Answers