apache-pig Questions
2
Solved
I want to group by a given field and get the output with grouped fields. Below is an example of what I am trying to achieve:-
Imagine a table named 'sample_table' with two columns as below:-
F1 F...
Cassius asked 8/5, 2013 at 15:3
3
Solved
Let's say I have a table such as the one below, that may or may not contain duplicates for a given field:
ID URL
--- ------------------
001 http://example.com/adam
002 http://example.com/beth
002 ...
Respond asked 27/5, 2014 at 23:54
4
I would like to perform the equivalent of "keep all a in A where a.field == b.field for some b in B" in Apache Pig. I am implementing it like so,
AB_joined = JOIN A by field, B by field;
A2 = FORE...
Nari asked 30/5, 2012 at 23:23
3
Solved
is there a way to create a small constant relation(table) in pig?
I need to create a relation with only 1 tuple that contains constant values.
something along the lines of:
A = LOAD using Constan...
Attribution asked 16/11, 2012 at 9:56
1
Solved
Is there a way we can get the value of a map for variable keys using the field as the key?
Eg : My company data has locale and name fields like this
{"en_US", (["en_US" : "English Name"], ["fr_FR...
Leucoplast asked 12/2, 2017 at 20:15
8
I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do it using hadoof fs comm...
Dyad asked 23/8, 2010 at 13:59
3
I'm trying to load simple file:
log = load 'file_1.gz' using TextLoader AS (line:chararray);
dump log
And I get an error:
2014-04-08 11:46:19,471 [main] ERROR org.apache.pig.tools.pigstats.Simp...
Assai asked 8/4, 2014 at 9:54
2
Solved
I'm trying to do a simple job using oozie. It will be a one simple Pig Action.
I have a file : FirstScript.pig containing:
dual = LOAD 'default.dual' USING org.apache.hcatalog.pig.HCatLoader();
s...
Samp asked 30/1, 2014 at 14:11
1
I have a number of bags and I want to compute the pairwise similarities between the bags.
sequences = FOREACH raw GENERATE gen_bag(logs);
The relation is described as follows:
sequences: {t: (...
Gissing asked 1/11, 2016 at 11:7
4
I have implemented an Apache Pig script. When I execute the script it results in many mappers for a specific step, but has only one reducer for that step. Because of this condition (many mappers, o...
Uniflorous asked 7/11, 2013 at 12:0
3
Solved
A = LOAD 'eventnew.txt' USING HCatalogLoader();
2015-07-08 19:56:34,875 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve HCatalogLoader using imports: [, java.lang....
Gustav asked 8/7, 2015 at 14:40
2
Solved
I am using datetime in some Python udfs that I use in my pig script. So far so good. I use pig 12.0 on Cloudera 5.5
However, I also need to use the pytz or dateutil packages as well and they dont ...
Handcrafted asked 26/8, 2016 at 22:52
1
I have setup a cluster using Amazon EMR.
I have a python library (cloned from github and not available on pip) on S3.
I want to submit a pig work that uses a udf which makes use of the library pre...
Devlen asked 7/8, 2016 at 2:42
2
In our organization, we have been trying to use hadoop ecosystem based tools to implement ETLs lately. Although the ecosystem itself is quite big, we are using only a very limited set of tools at t...
Magnate asked 25/2, 2014 at 23:30
4
Can someone example the computation of median/quantiles in map reduce?
My understanding of Datafu's median is that the 'n' mappers sort the
data and send the data to "1" reducer which is respons...
Astto asked 11/4, 2012 at 15:53
4
I am creating external table in Hive using parquet file as a storage
hive> CREATE EXTERNAL TABLE test_data(
c1 string, c2 int, c3 string, c4 string, c5 string, c6 float,
c7 string, c8 string,...
Maltzman asked 14/8, 2013 at 7:6
4
Solved
Just started Pig; trying to load the data from a file and dump it henceforth. Loading seems to be proper, no error is thrown. Below is the query:
NYSE = LOAD '/root/Desktop/Works/NYSE-2000-2001....
Nilsanilsen asked 3/12, 2013 at 11:38
6
Solved
I am using ubuntu 12.02 32bit and have installed hadoop2.2.0 and pig 0.12 successfully. Hadoop runs properly on my system.
However, whenever I run this command :
data = load 'atoz.csv' using PigS...
Gaberones asked 23/1, 2014 at 6:4
1
I am trying to create a Hive table with schema string,string,double on a folder containing two Parquet files. The first parquet file schema is string,string,double and the schema of the second file...
Writeup asked 20/1, 2016 at 1:58
1
Just installed Pig 0.13 and I am attempting to use it with Hadoop 1.1.2. (Pig documentation states Pig 0.13 is compatible with Hadoop 1.1.2). Per the Pig install instructions, I set $PIG_CLASSPATH ...
Nazarene asked 3/8, 2014 at 18:23
2
Solved
I'm having a lot of trouble getting data out of pig and into a CSV that I can use in Excel or SQL (or R or SPSS etc etc) without a lot of manipulation ...
I've tried using the following function:
...
Microgroove asked 4/12, 2012 at 4:21
5
Solved
I have a text file and it's first row contains the header. Now I want to do some operation on the data, but while loading the file using PigStorage it takes the HEADER too. I just want to skip the ...
Maximomaximum asked 1/10, 2013 at 11:44
5
Solved
What is the exact difference between Pig and Hive? I found that both have same functional meaning because they are used for doing same work. The only thing is implimentation which is different for ...
Meemeece asked 23/4, 2012 at 11:47
3
Using Apache Pig version 0.10.1.21 (rexported). When I execute a pig script, there are a lots of INFO logging lines which looks like that:
2013-05-18 14:30:12,810 [Thread-28] INFO org.apache.hado...
Dunleavy asked 18/5, 2013 at 18:45
7
Solved
I have a lot of gzip'd log files in s3 that has 3 types of log lines: b,c,i. i and c are both single level json:
{"this":"that","test":"4"}
Type b is deeply nested json. I came across this gist ...
Bullbat asked 16/2, 2011 at 5:59
© 2022 - 2024 — McMap. All rights reserved.