Apache Hadoop (394)
MapReduce (354)
Apache Hive (345)
Apache Pig (225)
Apache Spark (991)
Apache HBase (164)
Apache Flume (95)
Apache Impala (72)
Apache Cassandra (392)
Apache Mahout (35)
Apache Sqoop (82)
Apache ZooKeeper (65)
Apache Ambari (93)
Apache HCatalog (34)
Apache HDFS Hadoop Distributed File System (214)
Apache Kafka (189)
Apache Avro (26)
Apache Presto (15)
Apache Tajo (26)
Hadoop General (407)
Tunable consistency is supported by Cassandra. Explain.
How to perform the inter-cluster data copying work in HDFS?
What is the difference between Spark Transform in DStream and map ?
What do you understand by Thrift?
Does spark need hadoop?
Define parquet file format? How to convert data to parquet format?
What all tasks you can perform for managing services using Ambari service tab?
List out the various advantages of dataframe over rdd in apache spark?
What is table in hbase?
Data Engineer Given a list of followers in the format:123, 345234, 678345, 123…Where column one is the ID of the follower and column two is the ID of the followee. Find all mutual following pairs (the pair 123, 345 in the example above). How would you use Map/Reduce to solve the problem when the list does not fit in memory?
Explain textFile Vs wholeTextFile in Spark?
Explain about the basic parameters of mapper and reducer function
What are the Features of Hadoop?
What do you understand by mem-table in cassandra?
Explain how can you change a column data type in Hive?