Hadoop (4218)
Big Data General (104)
Big Data AllOther (3)
Ideally what should be replication factor in a Hadoop cluster?
Explain how can you change a column data type in Hive?
What is difference between client and cluster mode in spark?
Which database is used in hadoop?
What are the major differences between Hadoop 2 and Hadoop 3?
What is the difference between an hdfs block and input split?
What are consumers in kafka?
What according to you is a common mistake apache spark developers make when using spark ?
Replication causes data redundancy and consume a lot of space, then why is it pursued in hdfs?
what is storage and compute nodes?
What are the responsibilities of a data analyst?
Who created spark?
What is sc parallelize in spark?
Do we need to give a password, even if the key is added in ssh?
Where are rdd stored?