Hadoop (4218)
Big Data General (104)
Big Data AllOther (3) What is a JobTracker in Hadoop? How many instances of JobTracker run on a Hadoop Cluster?
Why do we need spark?
What do you think about the speculative execution?
What is pagerank in graphx?
Can we say a COGROUP is a group of more than 1 data set?
Is spark an etl?
Mention what is the number of default partitioner in Hadoop?
If a data Node is full how it's identified?
Who developed Apache Avro?
What do you understand by node in cassandra?
Why should we use presto?
What is a primary key? And what are it’s different types?
How would an hadoop administrator deploy various components of hadoop in production?
What are the ways to create RDDs in Apache Spark? Explain.
What is the maximum size of string data type supported by hive? Mention the hive support binary formats.