HadoopExam Blogs

HadoopExam Learning Resources

Question 3: You have been given following code written in Scala and Spark Below is the content for IBM.csv file





Now you have written following code, in interactive shell

val myRDD = sc.textFile("data.csv")

val splittedRDD = myRDD.map(_.split(","))

val distinctRDD = splittedRDD.map(x=>(x[0],1)).distinct()

val priceDataRDD = myRDD.map(x=>(x[1]))

In above program, which of the following RDD should be cached.

1. myRDD

2. splittedRDD

3. distinctRDD

4. priceDataRDD

Correct Answer: 1 Exp: If we are using same RDD, again and again then it is advisable to cache or persist the same. Cached RDD has already been computed and the data is already in memory. We can reuse this RDD without using any additional compute or memory resources.

Oreilly Databricks Spark Certification     Hortonworks HDPCD Spark Certification     Cloudera CCA175 Hadoop and Spark Developer Certifications    MCSD : MapR Certified Spark Developer  

  1. Apache Spark Professional Training with Hands On Lab Sessions 
  2. Oreilly Databricks Apache Spark Developer Certification Simulator
  3. Hortonworks Spark Developer Certification 
  4. Cloudera CCA175 Hadoop and Spark Developer Certification 

Watch below Training Video

You are here: Home MapR Certification MapR:Spark MapR Spark Certification Sample Question-3