Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
Why Hive is not suitable for OLTP systems?
Which one will you decide for an undertaking – Hadoop MapReduce or Apache Spark?
What size is recommended for each node?
Replication causes data redundancy and consume a lot of space, then why is it pursued in hdfs?
Is there a date data type in Hive?
What are the various libraries available on top of Apache Spark?
Since the data is replicated thrice in hdfs, does it mean that any calculation done on one node will also be replicated on the other two?
Explain Features of Pig?
What is the user of sparkContext?
Explain map-only job?
How many maximum jvm can run on a slave node?
What do we mean by Paraquet?
How many Reducers should be configured?
What is available mechanism for connecting from applications, when we run hive as a server?
File permissions in HDFS?