Hadoop (4218)
Big Data General (104)
Big Data AllOther (3)
Hadoop achieves parallelism by dividing the tasks across many nodes, it is possible for a few slow nodes to rate-limit the rest of the program and slow down the program. What mechanism Hadoop provides to combat this?
Mention key components of Hive Architecture?
Describe the distnct(),union(),intersection() and substract() transformation in Apache Spark RDD?
How jobtracker assign tasks to the tasktracker?
How to compress mapper output in Hadoop?
Explain what is a task tracker in hadoop?
What is the difference between apache mahout and cloudera oryx ?
Compare hive, hbase, and impala?
Define data integrity?
Can you explain broadcast variables?
Define cell in HBase?
What are the basic commands in Apache Sqoop and its uses?
What is a nosql database?
What is different table structure available in the hive?
What are Replication Tool and its types?