Big Data Interview Questions
Questions Answers Views Company eMail

How does hdfs provides good throughput?

28

Can we have different replication factor of the existing files in hdfs?

38

How will you perform the inter cluster data copying work in hdfs?

22

Suppose there is file of size 514 mb stored in hdfs (hadoop 2.x) using default block size configuration and default replication factor. Then, how many blocks will be created in total and what will be the size of each block?

87

List the files associated with metadata in hdfs?

22

Define data integrity? How does hdfs ensure data integrity of data blocks stored in hdfs?

33

What is a rack awareness algorithm and why is it used in hadoop?

22

What is a block?

34

What is throughput? How does hdfs provides good throughput?

48

Replication causes data redundancy and consume a lot of space, then why is it pursued in hdfs?

29

Define hadoop archives? What is the command for archiving a group of files in hdfs.

21

What do you mean by the high availability of a namenode?

15

What is the difference between nas (network attached storage) and hdfs?

37

What is a rack awareness algorithm?

47

What is the problem in having lots of small files in hdfs?

32


Un-Answered Questions { Big Data }

What is bloom filter?

55


Define sparkcontext in apache spark?

190


What is the role of the offset.

296


What do you understand by node in cassandra?

57


What is the role of “ambari-qa” user?

51






How to submit extra files(jars, static files) for Hadoop MapReduce job during runtime?

384


Define taskinstance?

186


What does illustrate do in Apache Pig?

618


Mention what are the three types of tombstone markers in hbase?

156


What do you understand by Executor Memory in a Spark application?

258


Is apache spark a database?

218


How will you backup an HBase cluster?

102


What are the limitations of the Pig?

318


What is mapreduce algorithm?

322


What is a block and block scanner in HDFS?

705