apache-pig Questions

2

Solved

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader: REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar; DEFINE SequenceFileLoader org.apache.pi...
Cinthiacintron asked 11/3, 2010 at 9:52

1

I have a collection of tuples of the form (t,a,b) that I want to group by b in Pig. Once grouped, I want to filter out b from the tuples in each group and generate a bag of filtered tuples per grou...
Incomprehension asked 29/5, 2012 at 23:39

2

Solved

i have a log files that is in a tarball (access.logs.tar.gz) loaded into my hadoop cluster. I was wondering is their way to directly load it to pig with out untaring it?
Virtually asked 17/4, 2012 at 4:21

1

Solved

What I want to do is to sum values of a field in all rows in an alias. This must be simple but somehow I can't find the answer. This is probably because what I want is a scalar value while PIG hand...
Gaylagayle asked 27/3, 2012 at 22:37

1

Solved

I read somewhere that Hadoop has a built-in support for compression and decompression but I guess it is about mapper output (by setting some properties)? I wonder if there is any particular PIG l...
Preside asked 27/3, 2012 at 19:45

1

Solved

I've been playing with Hive for few days now but I still have a hard time with partition. I've been recording Apache logs (Combine format) in Hadoop for few months. They are stored in row text for...
Ingra asked 8/3, 2012 at 23:36

2

Solved

(Even more basic than Difference between Pig and Hive? Why have both?) I have a data processing pipeline written in several Java map-reduce tasks over Hadoop (my own custom code, derived from Hado...
Maegan asked 7/11, 2011 at 14:38

3

Solved

I've been using either Pig or Java for Map Reduce exclusively for running jobs against a Hadoop cluster thus far. I've recently tried out using Python Map Reduce through the Hadoop streaming and th...
Permian asked 5/3, 2012 at 15:14

1

Solved

grunt> dump jn; (k1,k4,10) (k1,k5,15) (k2,k4,9) (k3,k4,16) grunt> jn = group jn by $1; grunt> dump jn; (k4,{(k1,k4,10),(k2,k4,9),(k3,k4,16)}) (k5,{(k1,k5,15)}) Now, from here I want ...
Florist asked 3/2, 2012 at 7:18

1

Solved

scalars can only be used with projection i am getting this error while using foreach.How can i resolved this error ? how can i use LIMIT within foreach ? please suggest some thanks in advance.. Ed...
Dander asked 2/2, 2012 at 6:44

6

Pig is a dataflow programming environment for processing very large files. Pig's language is called Pig Latin. Does anyone know of a good reference manual for PigLatin? I'm looking for somet...
Teri asked 15/12, 2008 at 13:57

1

I am very new to PIG and I am having what feels like a very basic problem. I have a line of code that reads: A = load 'Sites/trial_clustering/shortdocs/*' AS (word1:chararray, word2:chararray, ...
Spangler asked 11/11, 2011 at 19:36

1

Solved

A = load 'a.txt' as (id, a1); B = load 'b.txt as (id, b1); C = join A by id, B by id; D = foreach C generate id,a1,b1; dump D; 4th line fails on: Invalid field projection. Projected field [id] do...
Implicative asked 8/11, 2011 at 13:32

2

I have following tuple H1 and I want to strsplit its $0 into tuple.However I always get an error message: DUMP H1: (item32;item31;,1) m = FOREACH H1 GENERATE STRSPLIT($0, ";", 50); ERROR 1000...
Formulism asked 14/4, 2011 at 22:3

1

Solved

Are there any advantages (wrt performance / no of map reduces ) when i use COGROUP instead of JOIN in pig ? http://developer.yahoo.com/hadoop/tutorial/module6.html talks about the difference in t...
Fated asked 21/9, 2011 at 7:23

1

Solved

So, I've seen a couple of tutorials for this online, but each seems to say to do something different. Also, each of them doesn't seem to specify whether you're trying to get things to work on a rem...
Whencesoever asked 1/9, 2011 at 23:7

1

Solved

I have a pig job where in I need to filter the data by finding a word in it, Here is the snippet A = LOAD '/home/user/filename' USING PigStorage(','); B = FOREACH A GENERATE $27,$38; C = FILTER B...
Dashpot asked 16/9, 2011 at 13:58

2

Solved

I'm attempting to get Apache Pig up and running on my Hadoop cluster, and am encountering a permissions problem. Pig itself is launching and connecting to the cluster just fine- from within the Pig...
Maziar asked 25/8, 2011 at 16:38

1

Solved

Trying to join a one set which has number of days in the month with a data set on the year month key. After I join the and try to do a FOREACH over the set I get an ERROR: 1066 ... Backend error : ...
Kirbykirch asked 29/7, 2011 at 18:15

1

Solved

I would like to know how to run Pig queries stored in Hive format. I have configured Hive to store compressed data (using this tutorial http://wiki.apache.org/hadoop/Hive/CompressedStorage). Befo...
Vomer asked 21/4, 2011 at 7:50

1

Solved

As I've noted previously, Pig doesn't cope well with empty (0-byte) files. Unfortunately, there are lots of ways that these files can be created (even within Hadoop utilitities). I thought that I ...
Marciamarciano asked 21/4, 2011 at 23:5

1

hey all I followed the steps here: http://wiki.apache.org/pig/PiggyBank to build the piggybank jar but I keep getting the output below. I also built the pig project from source and reference that i...
Scrubby asked 9/4, 2011 at 16:8

1

Solved

I have a set set of records that I am loading from a file and the first thing I need to do is get the max and min of a column. In SQL I would do this with a subquery like this: select c.state, c...
Kroll asked 7/3, 2011 at 18:17

2

I have some very long lines as Apache Pig (Latin) expressions. Is there a way of splitting these over multiple lines? I've tried a trailing backslash to no avail, as soon as I press enter the (inc...
Bukharin asked 27/1, 2011 at 12:43

3

I read in a csv-file that contains fields with numbers like that: "3". Can I convert this fields from "3" to 3 with PigLatin? I need it to use the SUM() - Function. Thanks for your help!
Kalliekallista asked 8/12, 2010 at 16:4

© 2022 - 2024 — McMap. All rights reserved.