apache-pig - 3

7

Solved

PIG how to count a number of rows in alias

I did something like this to count the number of rows in an alias in PIG: logs = LOAD 'log' logs_w_one = foreach logs generate 1 as one; logs_group = group logs_w_one all; logs_count = foreach log...

hadoop apache-pig

Etoile asked 28/3, 2012 at 3:29

2

How can I debug a pig script

If while running a simple group by script in pig for large terabytes of data, the script got stuck at say 70%, then what can be done to diagnose the problem?

hadoop apache-pig bigdata

Astrograph asked 12/5, 2015 at 18:14

2

Pig Script without load

I am a newbie to Pig. I am trying to figure out how to define a bag or tuple with hard coded values, without loading data from a file. Every example that I have encountered with starts with: a = L...

hadoop apache-pig

Sphincter asked 1/8, 2014 at 19:34

6

Solved

Pig vs Hive vs Native Map Reduce

I've basic understanding on what Pig, Hive abstractions are. But I don't have a clear idea on the scenarios that require Hive, Pig or native map reduce. I went through few articles which basically...

hadoop mapreduce hive apache-pig

Iorgo asked 30/7, 2013 at 14:47

19

Difference between Pig and Hive? Why have both? [closed]

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS (PDF link). I understand tha...

hadoop hive apache-pig

Equity asked 28/7, 2010 at 18:42

2

Solved

Hadoop and Stata

Does anyone have any experience using Stata and Hadoop? Stata 13 now has a Java Plugin API, so I think it should be straightforward to get them to play nice. I am particularly interested in being ...

hadoop hive apache-pig stata

Tieback asked 3/10, 2013 at 17:41

3

Solved

Flag -useHCatalog not working

I installed CDH5.4 in single node following the instructions here, also, I put the hive-metastore in localmode using these instructions and everything works perfectly, except when I tried to connec...

hadoop apache-pig cloudera-cdh

Bulwerlytton asked 1/5, 2015 at 15:48

2

Solved

Selecting random tuple from bag

Is it possible to (efficiently) select a random tuple from a bag in pig? I can just take the first result of a bag (as it is unordered), but in my case I need a proper random selection. One (not e...

apache-pig

Boron asked 30/1, 2013 at 12:43

2

Solved

Transform bag of key-value tuples to map in Apache Pig

I am new to Pig and I want to convert a bag of tuples to a map with specific value in each tuple as key. Basically I want to change: {(id1, value1),(id2, value2), ...} into [id1#value1, id2#value2...

dictionary apache-pig

Jarrod asked 25/7, 2013 at 2:23

5

Is there any Conditional IF like operator in Apache PIG?

Actually I am writing PIG Script and want to execute some set of statements if one of the condition is satisfied. I have set one variable and checking for some value of that variable. Suppose if...

hadoop apache-pig

Vraisemblance asked 16/7, 2013 at 6:31

3

Flatten tuple like a bag

My dataset looks like the following: ( A, (1,2) ) ( B, (2,9) ) I would like to "flatten" the tuples in Pig, basically repeating each record for each value found in the inner-tuple, such that the...

hadoop apache-pig flatten

Tinney asked 15/5, 2012 at 4:28

4

Solved

How can I add a header row to files created from Pig (Hadoop)?

I'm writing a pig latin script similar to the following: A = load 'data' using PigStorage('\t'); store A into my_data using PigStorage(); This outputs (Bob, 10, 4.0) (Jim, 11, 3.25) (Paul, 9, 2...

hadoop apache-pig

Ellisellison asked 7/1, 2013 at 21:24

4

Solved

How do I suppress the bloat of useless information when using the DUMP command while using grunt via 'pig -x local'?

I'm working with PigLatin, using grunt, and every time I 'dump' stuffs, my console gets clobbered with blah blah, blah non-info, is there a way to surpress all that? grunt> A = LOAD 'testingData...

dump apache-pig gruntjs verbosity

Viscous asked 7/5, 2013 at 2:51

4

How can I incorporate the current input filename into my Pig Latin script?

I am processing data from a set of files which contain a date stamp as part of the filename. The data within the file does not contain the date stamp. I would like to process the filename and add i...

apache-pig

Vicarious asked 17/3, 2012 at 16:4

2

Pig non-aggregated warnings output location?

Pig: 0.8.1-cdh3u2 Hadoop: 0.20.2-cdh3u0 Debugging FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings, but I can't seem to make individual warnings printed anywhere. Disabling aggregation via -w or a...

hadoop apache-pig

Ophiology asked 14/12, 2011 at 19:58

2

Solved

How to store grouped records into multiple files with Pig?

After loading and grouping records, how can I store those grouped records into several files, one per group (=userid)? records = LOAD 'input' AS (userid:int, ...); grouped_records = GROUP records ...

java hadoop apache-pig

Fertility asked 16/2, 2012 at 15:52

3

How do I store gzipped files using PigStorage in Apache Pig?

Apache Pig v0.7 can read gzipped files with no extra effort on my part, e.g.: MyData = LOAD '/tmp/data.csv.gz' USING PigStorage(',') AS (timestamp, user, url); I can process that data and output...

apache-pig

Czarism asked 11/2, 2011 at 12:12

2

Solved

Difference between PIG local and mapreduce mode

What is the actual difference between running PIG scripts locally and on mapreduce? I understand mapreduce mode is when you run it on a cluster that has hdfs installed. Does this mean local mode d...

hadoop mapreduce hdfs apache-pig

Wavawave asked 26/7, 2012 at 12:33

1

Solved

In spark join, does table order matter like in pig?

Related to Spark - Joining 2 PairRDD elements When doing a regular join in pig, the last table in the join is not brought into memory but streamed through instead, so if A has small cardinality pe...

hadoop apache-spark apache-pig bigdata

Durham asked 24/2, 2015 at 11:24

1

Solved

concatenate a string to a field in pig

I like to concat a string to all data in a field? example a dataset mydata contains following field ( id, name, email ) i like to add a prefix of string test to all the data in the field name. I...

hadoop apache-pig

Magnitogorsk asked 30/1, 2015 at 0:47

1

Solved

Pig - ERROR 1045: AVG as multiple or none of them fit. Please use an explicit cast

I have a comma seperated .txt file, I want to DUMP the AVG age of all Males. records = LOAD 'file:/home/gautamshaw/Documents/PigDemo_CommaSep.txt' USING PigStorage(',') AS (firstname:chararray,las...

hadoop mapreduce apache-pig bigdata

Brana asked 30/1, 2015 at 1:27

3

Type conversion pig hcatalog

I use HCatalog version 0.4. I have a table in hive 'abc' which has a column with datatype 'timestamp'. When i try to run a pig script like this "raw_data = load 'abc' using org.apache.hcatalog.pig....

hive apache-pig hcatalog

Commandeer asked 20/2, 2014 at 0:41

2

Solved

How do you deal with empty or missing input files in Apache Pig?

Our workflow uses an AWS elastic map reduce cluster to run series of Pig jobs to manipulate a large amount of data into aggregated reports. Unfortunately, the input data is potentially inconsistent...

hadoop apache-pig

Capo asked 20/4, 2011 at 23:20

2

How to have Pig store rows in HBase as strings not bytes?

If I use the hbase shell and issue: put 'test', 'rowkey1','cf:foo', 'bar' scan 'test' I will see the result as a string, not in bytes. If I use happybase and issue: import happybase connection...

hbase apache-pig hbasestorage

Hardware asked 14/1, 2014 at 23:50

1

How to : Python UDF dictionary return schema in PIG

What is the output schema to return a dictionary from Python UDF while using Apache PIG. I have a dictionary of dictionaries, something like this: dict = {x:{a:1,b:2,c:3}, y:{d:1,e:3,f:9}} and ...

python dictionary schema user-defined-functions apache-pig

Diachronic asked 12/11, 2012 at 19:55

apache-pig Questions

Recommended topics

Hot tags