Big Data Interview Questions
Questions Answers Views Company eMail

What is reduce side join in mapreduce?

312

What do you mean by inputformat?

343

What are the various configuration parameters required to run a mapreduce job?

358

What is a distributed cache in mapreduce framework?

342

What do you mean by data locality?

369

How can we assure that the values regarding a particular key goes to the same reducer?

360

What is pig statistics?

279

List the relational operators in pig.

386

What are all stats classes in the java api package available?

294

List the diagnostic operators in pig.

307

Why do we need indexing?

403

What will happen in case you have not issued the command: ‘set hive.enforce.bucketing=true;’ before bucketing a table in hive in apache hive 0.x or 1.x?

432

What is hbase fsck?

149

What are different tombstone markers in hbase?

105

What is the use of get() method?

102


Un-Answered Questions { Big Data }

Explain write ahead log(journaling) in spark?

184


Can hadoop replace relational database?

225


What is the role of a zookeeper in a kafka cluster?

293


Is spark based on hadoop?

193


Is it possible to create multiple table in hive for same data?

415






Tell any two feature Flume?

64


What are the features of RDD, that makes RDD an important abstraction of Spark?

189


Is it possible to iterate through the rows of HBase table in reverse order?

151


What is HBase Shell?

126


What are the different execution modes available in Pig?

289


Explain avrostorage function?

314


How many maps are there in a particular job?

248


How to use Avro?

60


Which are the various data sources available in spark sql?

194


What are the four basic parameters of a reducer?

356