Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Explain about the execution plans of a pig script?
or
differentiate between the logical and physical plan of an apache pig script?
You have a file employee.txt in the hdfs directory with 100 records. You want to see only the first 10 records from the employee.txt file. How will you do this?
1002
What type of data we should put in distributed cache? When to put the data in dc? How much volume we should put in?
What is the difference between map and reduce?
What is data ingestion pipeline?
What do you mean by meta data in hdfs? List the files associated with metadata.
What are the components of Apache Spark Ecosystem?
What are Features of Hive?
Name the Spark Library which allows reliable file sharing at memory speed across different cluster frameworks.
Define the Use of Pig?
What is the latest version of sqoop?
How often do you need to reformat the namenode?
When to use Hive?
Explain different transformation on DStream?
What does the command mapred.job.tracker do?
What are the side data distribution techniques?
What does rdd stand for in logistics?