HadoopExam Learning Resources

HadoopExam Training, Interview Questions, Certifications, Projects, POC and Hands On exercise access

    40000+ Learners upgraded/switched career    Testimonials

Pig: correlate the tuples in a group to compute a metric

I think this should be a typical issue in pig programming.

I have a sample data like this (think a simple schema and we have this group of tuples with many id's):


For each tuple, I'd like to add a new field like below, which is the largest number you can find beyond the number in the tuple.


I tried nested foreach after grouping or joining but failed. Basically I'm trying to take minimum among the values which are greater than the referenced value. But technically it cannot be achieved. Want to make it without using UDFs.

Looks simple but needs some parallel programming mindset. Any tip about this?



What does 'beyond' mean? Wouldn't every field except for the last have 9 because 9 is the largest subsequent number?

Visit Home Page : http://hadoopexam.com for more detail . As you are not blacklisted user.

You are here: Home Question & Answer Hadoop Questions Pig: correlate the tuples in a group to compute a metric