What do shuffling do?
Explain the basic difference between traditional rdbms and hadoop?
Explain what if rack 2 and datanode fails?
Define a job tracker?
Explain the features of fully distributed mode?
What are the characteristics of hadoop framework?
Explain how can we change the split size if our commodity hardware has less storage space?
Explain how is hadoop different from other data processing tools?
Define a datanode?
Is secondary namenode a substitute to the namenode?
What is a secondary namenode?
What is unstructured data?
Explain the overview of hadoop history breifly?
How does yarn work with spark?
What happens when we submit a spark job?