Question 1: You have been given following code written in Scala and Spark. Below is the content for IBM.csv file





Now you have written following code, in interactive shell

val myRDD = sc.textFile("data.csv")

val splittedRDD = myRDD.map(_.split(","))

val value = splittedRDD.map(x=>x[0]).XXXXX.count()

Please replace XXXXX ith correct function, which will produce output value as 3

1. map(x=>len(x))

2. distinct()

3. filter(X=>X.contains("IBM")

4. No function is needed , it will be a redundant call

Correct Answer : 2 Exp : Steps are follow

1. Load data.csv file in myRDD (each line as a record in RDD)

2. Split the RDD based on comma (,). So it will create list of lists [[IBM,101,20150112].................]

3. Now select only first element of each row

4. Using the distict() function, we can select only distinct stock name

5. And once we have distict stock name, we just call a count() function on it. So it can generate desired output.

Watch below Training Video

