Calculate Average using PIG
Asked Answered
J

3

13

I am new to PIG and want to calculate Average of my one column data that looks like

0
10.1
20.1
30
40
50
60
70
80.1

I wrote this pig script

dividends = load 'myfile.txt' as (A);
dump dividends
grouped   = group dividends by A;
avg       = foreach grouped generate AVG(grouped.A);
dump avg

It parses data as

(0)
(10.1)
(20.1)
(30)
(40)
(50)
(60)
(70)
(80.1)

but gives this error for average

2013-03-04 15:10:58,289 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: 
<file try.pig, line 4, column 41> Invalid scalar projection: grouped
Details at logfile: /Users/PreetiGupta/Documents/CMPS290S/project/pig_1362438645642.log

ANY IDEA

Jurel answered 4/3, 2013 at 23:15 Comment(0)
S
23

The AVG built in function takes a bag as an input. In your group statement, you are currently grouping elements by the value of A, but what you really want to do is group all the elements into one bag.

Pig's GROUP ALL is what you want to use:

dividends = load 'myfile.txt' as (A);
dump dividends
grouped   = group dividends all;
avg       = foreach grouped generate AVG(dividends.A);
dump avg
Saccharose answered 5/3, 2013 at 1:25 Comment(0)
S
5

The below will work for calculating average:

dividends = load 'myfile.txt' as (A);
grouped   = GROUP dividends all;
avg       = foreach grouped generate AVG(dividends);
dump avg
Sexcentenary answered 6/3, 2013 at 7:41 Comment(0)
L
1

You have to use the original data variable name instead of using a group variable. In FOREACH line, I am using AVG(dividends.A) instead of AVG(grouped.A). Here is the solution script:

dividends = load 'myfile.txt' as (A);
dump dividends

grouped   = group dividends by A;
avg = foreach grouped generate AVG(dividends.A);
dump avg
Lyckman answered 6/6, 2021 at 4:10 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.