www.HadoopExam.com

HadoopExam Learning Resources

CCD-410 Certifcation CCA-500 Hadoop Administrator Exam HBase Certifcation CCB-400 Data Science Certifcation Hadoop Training with Hands On Lab Hadoop Package Deal

Spark: HBase manipulation

I have an HBase scanner:

val scan = new Scan()
scan.setStartRow(startScan)
scan.setStopRow(endScan+1) // +1 since scan is not inclusive
scan.setCaching(100)

val hConf = HBaseConfiguration.create()
hConf.set(TableInputFormat.INPUT_TABLE, tableName)
hConf.set(TableInputFormat.SCAN, convertScanToString(scan))
val hbaseRDD = sc.newAPIHadoopRDD(hConf, classOf[TableInputFormat], classOf[ImmutableBytesWritable], classOf[Result])

How do I query the hbaseRDD based on a given row in my schemaRDD? The HBase data manipulation is in comments to explain what I will do to the data/hbase.

val eventsRDD = sqlContext.sql("SELECT eventId, id, type, date1, date2 from parquetFile")

// row is org.apache.sql.Row
eventsRDD.foreach(row => {

    // find in HBase RDD ?????????????????????????????????
    // row(1).toString 

    // if not found, create HBase entry
    hPutRow = new Put(Bytes.toBytes(rowKey))
    hPutRows += hPutRow

    // if found,
        // if HBase entry has no date1, update HBase entry to add date1 of row
        // else, if row's date1 > HBase's date1,
            // rename HBase PK to rowkey-old
            // create HBase entry for new row
        // else if row's date1 < HBase's date1, ignore

})

// push HBase updates (create, update)

val hConf = HBaseConfiguration.create()
val hTable = new HTable(hConf, tableName)

hTable.put(hPutRows)
hTable.flushCommits()
hTable.close()

 

Add comment


Security code
Refresh

You are here: Home Question & Answer Hadoop Questions Spark: HBase manipulation