Big Data Interview Questions
Questions Answers Views Company eMail

Why rack awareness algorithm is used in hadoop?

20

Can you change the block size of hdfs files?

32

What is an identity mapper and identity reducer?

345

What are the advantages of using mapreduce with hadoop?

338

What do you know about nlineinputformat?

401

Why the output of map tasks are stored (spilled ) into local disc and not in hdfs?

371

What is the role of recordreader in hadoop mapreduce?

460

What happens when the node running the map task fails before the map output has been sent to the reducer?

371

Define speculative execution?

398

Is it legal to set the number of reducer task to zero? Where the output will be stored in this case?

374

What are the advantages of using map side join in mapreduce?

342

What is a map side join?

379

What is a combiner and where you should use it?

352

When should you use sequencefileinputformat?

385

What is the purpose of textinputformat?

430


Un-Answered Questions { Big Data }

What does the high availability of a name-node means?

224


Explain the level of parallelism in spark streaming?

206


How can Flume be used with HBase?

89


What are the Data types in Pig?

626


What is throughput in HDFS?

47






Explain the master class and the output class do?

365


Is hive similar to sql?

418


What will be the output of cast ('XYZ' as INT)?

422


What are the components of spark?

180


Explain distnct(),union(),intersection() and substract() transformation in Spark?

195


Name the most common input formats defined in hadoop?

244


What is Your Cluster size ?

1131


How can the columns of a table in hive be written to a file?

417


Explain what is a sequence file in hadoop?

258


Give the difference between Drop and Truncate in CQLSH?

45