A common pattern in my data processing is to group by some set of columns, apply a filter, then flatten again. For example:
my_data_grouped = group my_data by some_column;
my_data_grouped = filter my_data_grouped by <some expression>;
my_data = foreach my_data_grouped flatten(my_data);
The problem here is that if my_data
starts with a schema like (c1, c2, c3) after this operation it will have a schema like (mydata::c1, mydata::c2, mydata::c3). Is there a way to easily strip off the "mydata::" prefix if the columns are unique?
I know I can do something like this:
my_data = foreach my_data generate c1 as c1, c2 as c2, c3 as c3;
However that gets awkward and hard to maintain for data sets with lots of columns and is impossible for data sets with variable columns.