apache-pig Questions
7
Solved
I did something like this to count the number of rows in an alias in PIG:
logs = LOAD 'log'
logs_w_one = foreach logs generate 1 as one;
logs_group = group logs_w_one all;
logs_count = foreach log...
Etoile asked 28/3, 2012 at 3:29
2
If while running a simple group by script in pig for large terabytes of data, the script got stuck at say 70%, then what can be done to diagnose the problem?
Astrograph asked 12/5, 2015 at 18:14
2
I am a newbie to Pig. I am trying to figure out how to define a bag or tuple with hard coded values, without loading data from a file. Every example that I have encountered with starts with:
a = L...
Sphincter asked 1/8, 2014 at 19:34
6
Solved
I've basic understanding on what Pig, Hive abstractions are. But I don't have a clear idea on the scenarios that require Hive, Pig or native map reduce.
I went through few articles which basically...
Iorgo asked 30/7, 2013 at 14:47
19
My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS (PDF link).
I understand tha...
Equity asked 28/7, 2010 at 18:42
2
Solved
Does anyone have any experience using Stata and Hadoop? Stata 13 now has a Java Plugin API, so I think it should be straightforward to get them to play nice.
I am particularly interested in being ...
Tieback asked 3/10, 2013 at 17:41
3
Solved
I installed CDH5.4 in single node following the instructions here, also, I put the hive-metastore in localmode using these instructions and everything works perfectly, except when I tried to connec...
Bulwerlytton asked 1/5, 2015 at 15:48
2
Solved
Is it possible to (efficiently) select a random tuple from a bag in pig?
I can just take the first result of a bag (as it is unordered), but in my case I need a proper random selection.
One (not e...
Boron asked 30/1, 2013 at 12:43
2
Solved
I am new to Pig and I want to convert a bag of tuples to a map with specific value in each tuple as key. Basically I want to change:
{(id1, value1),(id2, value2), ...} into [id1#value1, id2#value2...
Jarrod asked 25/7, 2013 at 2:23
5
Actually I am writing PIG Script and want to execute some set of statements if one of the condition is satisfied.
I have set one variable and checking for some value of that variable. Suppose
if...
Vraisemblance asked 16/7, 2013 at 6:31
3
My dataset looks like the following:
( A, (1,2) )
( B, (2,9) )
I would like to "flatten" the tuples in Pig, basically repeating each record for each value found in the inner-tuple, such that the...
Tinney asked 15/5, 2012 at 4:28
4
Solved
I'm writing a pig latin script similar to the following:
A = load 'data' using PigStorage('\t');
store A into my_data using PigStorage();
This outputs
(Bob, 10, 4.0)
(Jim, 11, 3.25)
(Paul, 9, 2...
Ellisellison asked 7/1, 2013 at 21:24
4
Solved
I'm working with PigLatin, using grunt, and every time I 'dump' stuffs, my console gets clobbered with blah blah, blah non-info, is there a way to surpress all that?
grunt> A = LOAD 'testingData...
Viscous asked 7/5, 2013 at 2:51
4
I am processing data from a set of files which contain a date stamp as part of the filename. The data within the file does not contain the date stamp. I would like to process the filename and add i...
Vicarious asked 17/3, 2012 at 16:4
2
Pig: 0.8.1-cdh3u2
Hadoop: 0.20.2-cdh3u0
Debugging FIELD_DISCARDED_TYPE_CONVERSION_FAILED warnings, but I can't seem to make individual warnings printed anywhere. Disabling aggregation via -w or a...
Ophiology asked 14/12, 2011 at 19:58
2
Solved
After loading and grouping records, how can I store those grouped records into several files, one per group (=userid)?
records = LOAD 'input' AS (userid:int, ...);
grouped_records = GROUP records ...
Fertility asked 16/2, 2012 at 15:52
3
Apache Pig v0.7 can read gzipped files with no extra effort on my part, e.g.:
MyData = LOAD '/tmp/data.csv.gz' USING PigStorage(',') AS (timestamp, user, url);
I can process that data and output...
Czarism asked 11/2, 2011 at 12:12
2
Solved
What is the actual difference between running PIG scripts locally and on mapreduce?
I understand mapreduce mode is when you run it on a cluster that has hdfs installed. Does this mean local mode d...
Wavawave asked 26/7, 2012 at 12:33
1
Solved
Related to Spark - Joining 2 PairRDD elements
When doing a regular join in pig, the last table in the join is not brought into memory but streamed through instead, so if A has small cardinality pe...
Durham asked 24/2, 2015 at 11:24
1
Solved
I like to concat a string to all data in a field?
example a dataset mydata contains following field ( id, name, email ) i like to add a prefix of string test to all the data in the field name.
I...
Magnitogorsk asked 30/1, 2015 at 0:47
1
Solved
I have a comma seperated .txt file, I want to DUMP the AVG age of all Males.
records = LOAD 'file:/home/gautamshaw/Documents/PigDemo_CommaSep.txt' USING PigStorage(',') AS (firstname:chararray,las...
Brana asked 30/1, 2015 at 1:27
3
I use HCatalog version 0.4. I have a table in hive 'abc' which has a column with datatype 'timestamp'. When i try to run a pig script like this "raw_data = load 'abc' using org.apache.hcatalog.pig....
Commandeer asked 20/2, 2014 at 0:41
2
Solved
Our workflow uses an AWS elastic map reduce cluster to run series of Pig jobs to manipulate a large amount of data into aggregated reports. Unfortunately, the input data is potentially inconsistent...
Capo asked 20/4, 2011 at 23:20
2
If I use the hbase shell and issue:
put 'test', 'rowkey1','cf:foo', 'bar'
scan 'test'
I will see the result as a string, not in bytes.
If I use happybase and issue:
import happybase
connection...
Hardware asked 14/1, 2014 at 23:50
1
What is the output schema to return a dictionary from Python UDF while using Apache PIG.
I have a dictionary of dictionaries, something like this:
dict = {x:{a:1,b:2,c:3}, y:{d:1,e:3,f:9}}
and ...
Diachronic asked 12/11, 2012 at 19:55
© 2022 - 2024 — McMap. All rights reserved.