Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
Explain the process to trigger automatic clean-up in Spark to manage accumulated metadata.
What is the maximum size of a message that can be received by the kafka?
What is a partitioner and how the user can control which key will go to which reducer?
How to enable buckets in Hive?
What are the best features of impala?
What are the different tasks we can perform managing host using ambari host tab?
What is distributed copy (distcp)?
List out the ways of creating RDD in Apache Spark?
What is a keyspace in Cassandra?
What is the importance of dfs.namenode.name.dir in HDFS?
When to use coalesce and repartition in spark?
Which files are used by the startup and shutdown commands?
What is the utilization of hcatalog?
If a Replica stays out of the ISR for a long time, what does it signify?
How much faster is Apache spark than Hadoop?