www.HadoopExam.com

HadoopExam Learning Resources

CCD-410 Certifcation CCA-500 Hadoop Administrator Exam HBase Certifcation CCB-400 Data Science Certifcation Hadoop Training with Hands On Lab Hadoop Package Deal

Inserting Data Into Hive

I need to parse this file into a hive Table which is a dataset of movie reviews from amazon. I´m having problems constructing the regular expression to parse the .txt file and create the table with the correct column type.

The .txt

roduct/productId: B0001G6PZC
review/userId: A3F3THLLZXURQN
review/profileName: A. Y
review/helpfulness:3/3
review/score:4.0
review/time:1199664000
review/summary:Good story,Good action.GoodDrama.GoodMovie
review/text:When I first heard of this movie, I didn't think it would be that great, so I never bothered to go see it in theaters. Later on, I ended up downloading the movie, and didn't think much of it.<br /><br />But now after watching the movie on BD, I think that the movie is quite outstanding.Its got a good story behind it,with some level of historical basis behind it withSamurai becoming phased outintoJapan's modernization.<br /><br />It does a good job in immersing you into the conflicts that warriors must endure... and yet, find peace with the way of the Samurai as they are a warrior race and not savages.<br /><br />4/5 stars.

product/productId: B0001G6PZC
review/userId: A3J78KAIPW6KAH
review/profileName: Joan Paolo De Bastos "conde_almasy"
review/helpfulness: 3/3
review/score: 4.0
review/time: 1198540800
review/summary: Good Movie. Wonderful Visuals. A Great Way to SHOW OFF you Hi-Def System
review/text: Last Samurai is no masterpiece<br /><br />but technically it is<br /><br />the visuals, the sound effects, the music.<br /><br />If you want to show off to your friends what a great hi-def system you got, purchase this movie.<br /><br />If you want a classic, but lord of the rings or gone with the wind instead.

product/productId: B0001G6PZC
review/userId: A3F3B6HY9RJI04
review/profileName: James Duckett
review/helpfulness: 3/3
review/score: 5.0
review/time: 1192060800
review/summary: Great Movie, Fantastic HD Quality
review/text: After picking up my HD DVD player I've had troubles watching regular DVD movies.  I had heard some good things about this movie but couldn't pass it up once it was in high definition.<br /><br />The story is pretty good.  This is the story of Captain Algren who has been sent to Japan in the late 1800's in order to help them modernize the Japanese army as they go from fighting with swords and arrows to machine guns and cannons.<br /><br />After the "modern"Japanese army prematurely attacks the Samuraiand lose horribly,CaptainAlgrenis taken captive by the Samuraiand introduced to their way of life and refusal to lay down the sword in the name of compliance.In time,CaptainAlgren finds himself wanting to become one of the Samuraiand learning more of their way of life.<br /><br />The story is pretty good but what raises this up to the level of being outstanding is the high definition quality of the movie.It was fantastic, especially seeing the colorful Japanese landscape in all of its magnificence.<br /><br />If you like TomCruise action movies,thisis one to pick up especially in high definition (whether it be Blu-Rayor HD DVD).The violence can be extremely graphic (hey,thisis war) so if you are sensitive to that you may want to look for something else.Otherwise, the pacing of the movie is pretty good.It isn't an all out gore-fest... there is action and then it breaks and lets you relax and catch up a little bit and then goes back to action and so on and so forth.

This is my SQL :

CREATE EXTERNAL TABLE movies(id string, uId string, profileName string, helpfulness string, score float, time int, summary string, text string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' 
WITH serdeproperties("input.regex"="[ ].*","output.format.string"="%1$s %2$s %3$s %4$s %5$s %6$s %7$s %8$s"")
location '/user/hduser/moviesTest';

However hive is not parsing it correctly and: SELECT * FROM movies gives me this result:

NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL

Can anyone tell me what I´m doing wrong?


 

The fields of the table has to be a continuous string but in this case each of the fields starts in a new line so hive tries to get all the fields in a single line resulting in NULL rows...... If all the fields were put together in a line then the regular expression has to changed...... 
 
Can I change the regular expression without having to put all the data in a single line??

Add comment


Security code
Refresh

You are here: Home Question & Answer Hadoop Questions Inserting Data Into Hive