apache-pig Questions

7

Solved

I did something like this to count the number of rows in an alias in PIG: logs = LOAD 'log' logs_w_one = foreach logs generate 1 as one; logs_group = group logs_w_one all; logs_count = foreach log...
Etoile asked 28/3, 2012 at 3:29

2

If while running a simple group by script in pig for large terabytes of data, the script got stuck at say 70%, then what can be done to diagnose the problem?
Astrograph asked 12/5, 2015 at 18:14

2

I am a newbie to Pig. I am trying to figure out how to define a bag or tuple with hard coded values, without loading data from a file. Every example that I have encountered with starts with: a = L...
Sphincter asked 1/8, 2014 at 19:34

6

Solved

I've basic understanding on what Pig, Hive abstractions are. But I don't have a clear idea on the scenarios that require Hive, Pig or native map reduce. I went through few articles which basically...
Iorgo asked 30/7, 2013 at 14:47

19

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS (PDF link). I understand tha...
Equity asked 28/7, 2010 at 18:42

2

Solved

Does anyone have any experience using Stata and Hadoop? Stata 13 now has a Java Plugin API, so I think it should be straightforward to get them to play nice. I am particularly interested in being ...
Tieback asked 3/10, 2013 at 17:41

3

Solved

I installed CDH5.4 in single node following the instructions here, also, I put the hive-metastore in localmode using these instructions and everything works perfectly, except when I tried to connec...
Bulwerlytton asked 1/5, 2015 at 15:48

2

Solved

Is it possible to (efficiently) select a random tuple from a bag in pig? I can just take the first result of a bag (as it is unordered), but in my case I need a proper random selection. One (not e...
Boron asked 30/1, 2013 at 12:43

2

Solved

I am new to Pig and I want to convert a bag of tuples to a map with specific value in each tuple as key. Basically I want to change: {(id1, value1),(id2, value2), ...} into [id1#value1, id2#value2...
Jarrod asked 25/7, 2013 at 2:23

5

Actually I am writing PIG Script and want to execute some set of statements if one of the condition is satisfied. I have set one variable and checking for some value of that variable. Suppose if...
Vraisemblance asked 16/7, 2013 at 6:31

3

My dataset looks like the following: ( A, (1,2) ) ( B, (2,9) ) I would like to "flatten" the tuples in Pig, basically repeating each record for each value found in the inner-tuple, such that the...
Tinney asked 15/5, 2012 at 4:28

4

Solved

I'm writing a pig latin script similar to the following: A = load 'data' using PigStorage('\t'); store A into my_data using PigStorage(); This outputs (Bob, 10, 4.0) (Jim, 11, 3.25) (Paul, 9, 2...
Ellisellison asked 7/1, 2013 at 21:24

4

Solved

I'm working with PigLatin, using grunt, and every time I 'dump' stuffs, my console gets clobbered with blah blah, blah non-info, is there a way to surpress all that? grunt> A = LOAD 'testingData...
Viscous asked 7/5, 2013 at 2:51

4

I am processing data from a set of files which contain a date stamp as part of the filename. The data within the file does not contain the date stamp. I would like to process the filename and add i...
Vicarious asked 17/3, 2012 at 16:4

2

Pig: 0.8.1-cdh3u2 Hadoop: 0.20.2-cdh3u0 Debugging FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings, but I can't seem to make individual warnings printed anywhere. Disabling aggregation via -w or a...
Ophiology asked 14/12, 2011 at 19:58

2

Solved

After loading and grouping records, how can I store those grouped records into several files, one per group (=userid)? records = LOAD 'input' AS (userid:int, ...); grouped_records = GROUP records ...
Fertility asked 16/2, 2012 at 15:52

3

Apache Pig v0.7 can read gzipped files with no extra effort on my part, e.g.: MyData = LOAD '/tmp/data.csv.gz' USING PigStorage(',') AS (timestamp, user, url); I can process that data and output...
Czarism asked 11/2, 2011 at 12:12

2

Solved

What is the actual difference between running PIG scripts locally and on mapreduce? I understand mapreduce mode is when you run it on a cluster that has hdfs installed. Does this mean local mode d...
Wavawave asked 26/7, 2012 at 12:33

1

Solved

Related to Spark - Joining 2 PairRDD elements When doing a regular join in pig, the last table in the join is not brought into memory but streamed through instead, so if A has small cardinality pe...
Durham asked 24/2, 2015 at 11:24

1

Solved

I like to concat a string to all data in a field? example a dataset mydata contains following field ( id, name, email ) i like to add a prefix of string test to all the data in the field name. I...
Magnitogorsk asked 30/1, 2015 at 0:47

1

Solved

I have a comma seperated .txt file, I want to DUMP the AVG age of all Males. records = LOAD 'file:/home/gautamshaw/Documents/PigDemo_CommaSep.txt' USING PigStorage(',') AS (firstname:chararray,las...
Brana asked 30/1, 2015 at 1:27

3

I use HCatalog version 0.4. I have a table in hive 'abc' which has a column with datatype 'timestamp'. When i try to run a pig script like this "raw_data = load 'abc' using org.apache.hcatalog.pig....
Commandeer asked 20/2, 2014 at 0:41

2

Solved

Our workflow uses an AWS elastic map reduce cluster to run series of Pig jobs to manipulate a large amount of data into aggregated reports. Unfortunately, the input data is potentially inconsistent...
Capo asked 20/4, 2011 at 23:20

2

If I use the hbase shell and issue: put 'test', 'rowkey1','cf:foo', 'bar' scan 'test' I will see the result as a string, not in bytes. If I use happybase and issue: import happybase connection...
Hardware asked 14/1, 2014 at 23:50

1

What is the output schema to return a dictionary from Python UDF while using Apache PIG. I have a dictionary of dictionaries, something like this: dict = {x:{a:1,b:2,c:3}, y:{d:1,e:3,f:9}} and ...
Diachronic asked 12/11, 2012 at 19:55

© 2022 - 2024 — McMap. All rights reserved.