HadoopExam Learning Resources

CCD-410 Certifcation CCA-500 Hadoop Administrator Exam HBase Certifcation CCB-400 Data Science Certifcation Hadoop Training with Hands On Lab Hadoop Package Deal

Pig: correlate the tuples in a group to compute a metric

I think this should be a typical issue in pig programming.

I have a sample data like this (think a simple schema and we have this group of tuples with many id's):


For each tuple, I'd like to add a new field like below, which is the largest number you can find beyond the number in the tuple.


I tried nested foreach after grouping or joining but failed. Basically I'm trying to take minimum among the values which are greater than the referenced value. But technically it cannot be achieved. Want to make it without using UDFs.

Looks simple but needs some parallel programming mindset. Any tip about this?



What does 'beyond' mean? Wouldn't every field except for the last have 9 because 9 is the largest subsequent number?

You have no rights to post comments

You are here: Home Question & Answer Hadoop Questions Pig: correlate the tuples in a group to compute a metric