Please enumerate the various components of the Spark Ecosystem.
Answer / Rajkumar
1. Spark Core: provides fundamental data parallelism and fault-tolerance.n 2. Spark SQL: provides APIs for structured data processing.n 3. Spark Streaming: allows real-time stream processing.n 4. MLlib: a machine learning library built on top of Spark Core.n 5. GraphX: provides graph processing capabilities.n 6. SparkR and PySpark: APIs for R and Python respectively, to interface with Spark.
| Is This Answer Correct ? | 0 Yes | 0 No |
Explain lineage graph
What are the various modes in which Spark runs on YARN? (Local vs Client vs Cluster Mode)
How is spark different from hadoop?
Are spark dataframes distributed?
Do I need to know hadoop to learn spark?
What is Spark Dataset?
Explain the Parquet File format in Apache Spark. When is it the best to choose this?
What is the user of sparkContext?
Explain countByValue() operation in Apache Spark RDD?
Define the term ‘Lazy Evolution’ with reference to Apache Spark
What is a DStream?
Which storage level does the cache () function use?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)