Hadoop (4218)
Big Data General (104)
Big Data AllOther (3)
How would you pipeline large amounts of data?
Explain the master class and the output class do?
Explain what is speculative execution?
Define role of variety in big data?
Why is apache spark so fast?
What is difference between rdd and dataframe?
What are combiners? When should I use a combiner in my MapReduce Job?
How data or a file is written into hdfs?
What are broadcast variables in spark?
List of some best tools that can be useful for data-analysis?
What is a hive in big data?
what is Memtable in Cassandra?
Which modes can Hadoop be run in? List a few features for each mode?
What is the characteristic of streaming API that makes it flexible run MapReduce jobs in languages like Perl, Ruby, Awk etc.?
What are the features of spark rdd?