HadoopExam Blogs

HadoopExam Learning Resources

Question 1: You have been given following code written in Scala and Spark. Below is the content for IBM.csv file

IBM,101,20150112

Google,400,20150112

IBM,107,20150113

Apple,230,20150112

Now you have written following code, in interactive shell

val myRDD = sc.textFile("data.csv")

val splittedRDD = myRDD.map(_.split(","))

val value = splittedRDD.map(x=>x[0]).XXXXX.count()

Please replace XXXXX ith correct function, which will produce output value as 3

1. map(x=>len(x))

2. distinct()

3. filter(X=>X.contains("IBM")

4. No function is needed , it will be a redundant call

Correct Answer : 2 Exp : Steps are follow

1. Load data.csv file in myRDD (each line as a record in RDD)

2. Split the RDD based on comma (,). So it will create list of lists [[IBM,101,20150112].................]

3. Now select only first element of each row

4. Using the distict() function, we can select only distinct stock name

5. And once we have distict stock name, we just call a count() function on it. So it can generate desired output.

Oreilly Databricks Spark Certification     Hortonworks HDPCD Spark Certification     Cloudera CCA175 Hadoop and Spark Developer Certifications    MCSD : MapR Certified Spark Developer  

  1. Apache Spark Professional Training with Hands On Lab Sessions 
  2. Oreilly Databricks Apache Spark Developer Certification Simulator
  3. Hortonworks Spark Developer Certification 
  4. Cloudera CCA175 Hadoop and Spark Developer Certification 

Watch below Training Video

You are here: Home MapR Certification MapR:Spark MapR Spark Certification Sample Question-1