Pig: apply a FOREACH operator to each element within a bag
Asked Answered
L

1

6

Example: I have a relation "class", with a nested bag of students:

class: {teacher_name: chararray,students: {(firstname: chararray, lastname: chararray)}

I want to perform an operation on each student, while leaving the global structure untouched, ie, obtain:

class: {teacher_name: chararray,students: {(fullname: chararray)}

where for each student, fullname = CONCAT(firstname, lastname)

My understanding is that a nested FOREACH would not be my solution here, as it still only generates 1 record per input tuple, whereas I want something that would apply within each bag item.

Pretty easy to do with an UDF but wondered if it's possible to do it in pure Piglatin

Liable answered 24/8, 2012 at 9:18 Comment(0)
R
19

In PIG 0.10 it is possible without the UDF, as FOREACH can be nested in FOREACH. Here is an example:

inpt = load '~/pig/data/bag_concat.dat' as (k : chararray, c1 : chararray, c2 : chararray);
dump inpt;
1   q   w
1   s   d
2   q   a
2   t   y
2   u   i
2   o   p

bags = group inpt by k;
describe bags;

bags: {group: chararray,inpt: {(k: chararray,c1: chararray,c2: chararray)}}

result = foreach bags {
    concat = foreach inpt generate CONCAT(c1, c2); --it will iterate only over the records of the inpt bag
    generate group, concat;
};
dump result;

(1,{(qw),(sd)})
(2,{(qa),(ty),(ui),(op)})
Rusch answered 25/8, 2012 at 20:45 Comment(2)
what is the use of nested foreach what ever you have done, that can be done in the generate after group. so it does not seems to have much scene .. could you please explain.Battle
Nested foreach had iterated through the elements of the bag and thus had preserved bag. If you do not need to preserve the bag, than flatten and combat, but that was no the question.Rusch

© 2022 - 2024 — McMap. All rights reserved.