Hadoop (4218)
Big Data General (104)
Big Data AllOther (3) Does impala performance improve as it is deployed to more hosts in a cluster in much the same way that hadoop performance does?
164What is RDD in Apache Spark? How are they computed in Spark? what are the various ways in which it can create?
320What role does worker node play in Apache Spark Cluster? And what is the need to register a worker node with the driver program?
346What is the reason behind Transformation being a lazy operation in Apache Spark RDD? How is it useful?
481
What are the relational operators available related to loading and storing in pig language?
Why do we need hadoop for big data analytics?
Define partitioning key?
How does A/B testing work?
What do you mean by Schema Declaration?
What does adminclient api in kafka?
what does the conf.setMapper Class do ?
In which language Cassandra is written?
What is regionserver?
Name different types of the data model?
What is column families? What happens if you alter the block size of ColumnFamily on an already populated database?
What are the ways to run spark over hadoop?
what are Task Tracker and Job Tracker?
What happens to existing data in my cluster when I add new nodes?
What is pipelined rdd?