HadoopExam Learning Resources

HadoopExam Training, Interview Questions, Certifications, Projects, POC and Hands On exercise access

    40000+ Learners upgraded/switched career    Testimonials

RDD to DF not working in scala

I want to create persistent tables in spark-shell ... hence I'm converting RDD to Df to save it as parquet file.. but I'm getting the error .. please check the following steps and give me the solution.. 

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> 15/05/15 19:51:25 INFO BlockManagerMasterActor: Registering block manager ubuntu:47169 with 267.3 MB RAM, BlockManagerId(0, ubuntu, 47169)

scala> val sqlContext= new org.apache.spark.sql.SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@251d69db

scala> import sqlContext._
import sqlContext._

scala> case class  Person(name :String,age:Int)
defined class Person


scala> val people=sc.textFile("person.txt").map(_.split(",")).map(p=>Person(p(0),p(1).trim.toInt)).toDF()
<console>:22: error: value toDF is not a member of org.apache.spark.rdd.RDD[Person]
       val people=sc.textFile("person.txt").map(_.split(",")).map(p=>Person(p(0),p(1).trim.toInt)).toDF()

You need to comeback to scala spark context from sql‎ context then try DF.
As you are in spark sql context still you can't use Df.
val people=sc.textFile("person.txt").map(_.split(",")).map(p=>Person(p(0),p(1).trim.toInt)).toDF()
I would do like this.
Case class peopletable (name: String, ‎id: int)
val people=sc.textFile(file).map(split)...trim.toInt))
Val op = sqlContextsql("select...")
Val c =op.collect()
Val rdd = sc.parallelize(c)
Val dftest=rdd.toDf


rdd.saveAsTextFile ("opfilename")

I was using spark 1.2... but it was not supporting DataFrame.. so i have upgraded spark version from 1.2 to 1.3 ... this solved my problem.

Visit Home Page : http://hadoopexam.com for more detail . As you are not blacklisted user.

You are here: Home Question & Answer Hadoop Questions RDD to DF not working in scala