Why do we use HDFS for applications having large data sets and not when there are lot of small files?
Knox and Hadoop Development Tools?
Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
What are the steps to submit a Hadoop job?
What is Apache Hadoop?
What is a secondary namenode?
How to resolve IOException: Cannot create directory
How to enable recycle bin in hadoop?
What is the difference between HDFS and NAS ?
What do the master class and the output class do?
What is yarn in hadoop?
What is the function of ApplicationMaster?
How will you make changes to the default configuration files?
Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)