apache-pig Questions

2

Solved

I want to group by a given field and get the output with grouped fields. Below is an example of what I am trying to achieve:- Imagine a table named 'sample_table' with two columns as below:- F1 F...
Cassius asked 8/5, 2013 at 15:3

3

Solved

Let's say I have a table such as the one below, that may or may not contain duplicates for a given field: ID URL --- ------------------ 001 http://example.com/adam 002 http://example.com/beth 002 ...
Respond asked 27/5, 2014 at 23:54

4

I would like to perform the equivalent of "keep all a in A where a.field == b.field for some b in B" in Apache Pig. I am implementing it like so, AB_joined = JOIN A by field, B by field; A2 = FORE...
Nari asked 30/5, 2012 at 23:23

3

Solved

is there a way to create a small constant relation(table) in pig? I need to create a relation with only 1 tuple that contains constant values. something along the lines of: A = LOAD using Constan...
Attribution asked 16/11, 2012 at 9:56

1

Solved

Is there a way we can get the value of a map for variable keys using the field as the key? Eg : My company data has locale and name fields like this {"en_US", (["en_US" : "English Name"], ["fr_FR...
Leucoplast asked 12/2, 2017 at 20:15

8

I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do it using hadoof fs comm...
Dyad asked 23/8, 2010 at 13:59

3

I'm trying to load simple file: log = load 'file_1.gz' using TextLoader AS (line:chararray); dump log And I get an error: 2014-04-08 11:46:19,471 [main] ERROR org.apache.pig.tools.pigstats.Simp...
Assai asked 8/4, 2014 at 9:54

2

Solved

I'm trying to do a simple job using oozie. It will be a one simple Pig Action. I have a file : FirstScript.pig containing: dual = LOAD 'default.dual' USING org.apache.hcatalog.pig.HCatLoader(); s...
Samp asked 30/1, 2014 at 14:11

1

I have a number of bags and I want to compute the pairwise similarities between the bags. sequences = FOREACH raw GENERATE gen_bag(logs); The relation is described as follows: sequences: {t: (...
Gissing asked 1/11, 2016 at 11:7

4

I have implemented an Apache Pig script. When I execute the script it results in many mappers for a specific step, but has only one reducer for that step. Because of this condition (many mappers, o...
Uniflorous asked 7/11, 2013 at 12:0

3

Solved

A = LOAD 'eventnew.txt' USING HCatalogLoader(); 2015-07-08 19:56:34,875 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve HCatalogLoader using imports: [, java.lang....
Gustav asked 8/7, 2015 at 14:40

2

Solved

I am using datetime in some Python udfs that I use in my pig script. So far so good. I use pig 12.0 on Cloudera 5.5 However, I also need to use the pytz or dateutil packages as well and they dont ...
Handcrafted asked 26/8, 2016 at 22:52

1

I have setup a cluster using Amazon EMR. I have a python library (cloned from github and not available on pip) on S3. I want to submit a pig work that uses a udf which makes use of the library pre...

2

In our organization, we have been trying to use hadoop ecosystem based tools to implement ETLs lately. Although the ecosystem itself is quite big, we are using only a very limited set of tools at t...
Magnate asked 25/2, 2014 at 23:30

4

Can someone example the computation of median/quantiles in map reduce? My understanding of Datafu's median is that the 'n' mappers sort the data and send the data to "1" reducer which is respons...
Astto asked 11/4, 2012 at 15:53

4

I am creating external table in Hive using parquet file as a storage hive> CREATE EXTERNAL TABLE test_data( c1 string, c2 int, c3 string, c4 string, c5 string, c6 float, c7 string, c8 string,...
Maltzman asked 14/8, 2013 at 7:6

4

Solved

Just started Pig; trying to load the data from a file and dump it henceforth. Loading seems to be proper, no error is thrown. Below is the query: NYSE = LOAD '/root/Desktop/Works/NYSE-2000-2001....
Nilsanilsen asked 3/12, 2013 at 11:38

6

Solved

I am using ubuntu 12.02 32bit and have installed hadoop2.2.0 and pig 0.12 successfully. Hadoop runs properly on my system. However, whenever I run this command : data = load 'atoz.csv' using PigS...
Gaberones asked 23/1, 2014 at 6:4

1

I am trying to create a Hive table with schema string,string,double on a folder containing two Parquet files. The first parquet file schema is string,string,double and the schema of the second file...
Writeup asked 20/1, 2016 at 1:58

1

Just installed Pig 0.13 and I am attempting to use it with Hadoop 1.1.2. (Pig documentation states Pig 0.13 is compatible with Hadoop 1.1.2). Per the Pig install instructions, I set $PIG_CLASSPATH ...
Nazarene asked 3/8, 2014 at 18:23

2

Solved

I'm having a lot of trouble getting data out of pig and into a CSV that I can use in Excel or SQL (or R or SPSS etc etc) without a lot of manipulation ... I've tried using the following function: ...
Microgroove asked 4/12, 2012 at 4:21

5

Solved

I have a text file and it's first row contains the header. Now I want to do some operation on the data, but while loading the file using PigStorage it takes the HEADER too. I just want to skip the ...
Maximomaximum asked 1/10, 2013 at 11:44

5

Solved

What is the exact difference between Pig and Hive? I found that both have same functional meaning because they are used for doing same work. The only thing is implimentation which is different for ...
Meemeece asked 23/4, 2012 at 11:47

3

Using Apache Pig version 0.10.1.21 (rexported). When I execute a pig script, there are a lots of INFO logging lines which looks like that: 2013-05-18 14:30:12,810 [Thread-28] INFO org.apache.hado...
Dunleavy asked 18/5, 2013 at 18:45

7

Solved

I have a lot of gzip'd log files in s3 that has 3 types of log lines: b,c,i. i and c are both single level json: {"this":"that","test":"4"} Type b is deeply nested json. I came across this gist ...
Bullbat asked 16/2, 2011 at 5:59

© 2022 - 2024 — McMap. All rights reserved.