What is structured data?
Define streaming access?
Define fault tolerance?
What do sorting do?
What is nlineoutputformat?
Explain why do we need hadoop?
Explain the features of pseudo mode?
What do shuffling do?
Explain the basic difference between traditional rdbms and hadoop?
Explain what if rack 2 and datanode fails?
Define a job tracker?
Explain the features of fully distributed mode?
What are the characteristics of hadoop framework?
Explain how can we change the split size if our commodity hardware has less storage space?
Explain how is hadoop different from other data processing tools?