Could not infer COUNT function
Asked Answered
N

3

9

I'm trying to write a pig latin script to pull the count of a dataset that I've filtered.

Here's the script so far:

/* scans by title */

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
scancount       = FOREACH productscans GENERATE COUNT($0);
DUMP scancount;

For some reason, I get the error:

Could not infer the matching function for org.apache.pig.builtin.COUNT as multiple or none of them fit. Please use an explicit cast.

What am I doing wrong here? I'm assuming it has something to do with the type of the field I'm passing in, but I can't seem to resolve this.

TIA, Jason

Neutrality answered 22/3, 2012 at 16:19 Comment(0)
C
16

Is this what you're looking for (group by all to bring everything into one bag, then count the items):

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT(productscans);
dump count;
Ciliata answered 23/3, 2012 at 1:58 Comment(1)
That's it (minus the "FOREACH g" should be "FOREACH grouped") - thanks Chris!Neutrality
C
7

COUNT requires a preceding GROUP ALL statement for global counts and a GROUP BY statement for group counts.

You can use any of below :

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT(productscans);
DUMP scancount;

Or

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
grouped         = GROUP productscans ALL;
count           = FOREACH grouped GENERATE COUNT($1);
DUMP scancount;
Charming answered 20/11, 2014 at 7:29 Comment(0)
P
0

Maybe

/* scans by title */

scans           = LOAD '/hive/scans/*' USING PigStorage(',') AS (thetime:long,product_id:long,lat:double,lon:double,user:chararray,category:chararray,title:chararray);
productscans    = FILTER scans BY (title MATCHES 'proactiv');
scancount       = FOREACH productscans GENERATE COUNT(productscans);
DUMP scancount;
Parsec answered 22/3, 2012 at 19:31 Comment(1)
thanks Jake - unfortunately, no luck. that gives me: Invalid scalar projection: productscans : A column needs to be projected from a relation for it to be used as a scalarNeutrality

© 2022 - 2024 — McMap. All rights reserved.