Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407) Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
712How would you use Map/Reduce to split a very large graph into smaller pieces and parallelize the computation of edges according to the fast/dynamic change of data?
651Write a Hive UDF that returns a sentiment score. For example, if good = 1, bad = -1, and average = 0, then a review of a restaurant states "Good food, bad service," your score might be 1 - 1 = 0.
642Suppose that your data is stored in collections, for instance, some binary data, message data or metadata is all keyed on the same value. Will you use HBase for this?
156
What is the significance of ‘IF EXISTS” clause while dropping a table?
Can rdd be shared between sparkcontexts?
Can Apache Kafka be used without Zookeeper?
Explain the term Cluster?
How can you configur the log cleaner?
What is the local repository and where it is useful while using ambari environment?
What is the use of cassandra cql collection?
Discuss and explain the various types of partitioners in cassandra?
Name the most common Input Formats defined in Hadoop? Which one is default?
How does hdfs provides good throughput?
What is cluster in Cassandra?
Difference Between Apache Sqoop vs Flume?
What are the steps to submit a Hadoop job?
What is Hadoop HDFS – Hadoop Distributed File System?
What is Derby database?