Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Why do we use HDFS for applications having large data sets and not when there are lot of small files?
1 2771
According to IBM, what are the three characteristics of Big Data?
Give a brief overview of Hadoop history?
Can you define a checkpoint?
What are the barriers?
What is the difference between traditional RDBMS and Hadoop?
What is pig statistics?
what is SPF?
What is the purpose of RecordReader in hadoop?
What is the abstraction of Spark Streaming?
What is spark rdd?
Name job control options specified by mapreduce.
How to set which framework would be used to run mapreduce program?
What are the advantages of datasets in spark?
Do we require two servers for the namenode and the datanodes?
What are the machine learning algorithms supports in apache mahout?