HadoopExam Learning Resources

CCD-410 Certifcation CCA-500 Hadoop Administrator Exam HBase Certifcation CCB-400 Data Science Certifcation Hadoop Training with Hands On Lab Hadoop Package Deal

RDD to DF not working in scala

I want to create persistent tables in spark-shell ... hence I'm converting RDD to Df to save it as parquet file.. but I'm getting the error .. please check the following steps and give me the solution.. 

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> 15/05/15 19:51:25 INFO BlockManagerMasterActor: Registering block manager ubuntu:47169 with 267.3 MB RAM, BlockManagerId(0, ubuntu, 47169)

scala> val sqlContext= new org.apache.spark.sql.SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@251d69db

scala> import sqlContext._
import sqlContext._

scala> case class  Person(name :String,age:Int)
defined class Person


scala> val people=sc.textFile("person.txt").map(_.split(",")).map(p=>Person(p(0),p(1).trim.toInt)).toDF()
<console>:22: error: value toDF is not a member of org.apache.spark.rdd.RDD[Person]
       val people=sc.textFile("person.txt").map(_.split(",")).map(p=>Person(p(0),p(1).trim.toInt)).toDF()

You need to comeback to scala spark context from sql‎ context then try DF.
As you are in spark sql context still you can't use Df.
val people=sc.textFile("person.txt").map(_.split(",")).map(p=>Person(p(0),p(1).trim.toInt)).toDF()
I would do like this.
Case class peopletable (name: String, ‎id: int)
val people=sc.textFile(file).map(split)...trim.toInt))
Val op = sqlContextsql("select...")
Val c =op.collect()
Val rdd = sc.parallelize(c)
Val dftest=rdd.toDf


rdd.saveAsTextFile ("opfilename")

I was using spark 1.2... but it was not supporting DataFrame.. so i have upgraded spark version from 1.2 to 1.3 ... this solved my problem.

Add comment

Security code

You are here: Home Question & Answer Hadoop Questions RDD to DF not working in scala