apache-pig - 2

2

Solved

How to get array/bag of elements from Hive group by operator?

I want to group by a given field and get the output with grouped fields. Below is an example of what I am trying to achieve:- Imagine a table named 'sample_table' with two columns as below:- F1 F...

sql hadoop hive apache-pig bigdata

Cassius asked 8/5, 2013 at 15:3

3

Solved

In Apache Pig, select DISTINCT rows based on a single column

Let's say I have a table such as the one below, that may or may not contain duplicates for a given field: ID URL --- ------------------ 001 http://example.com/adam 002 http://example.com/beth 002 ...

group-by apache-pig distinct

Respond asked 27/5, 2014 at 23:54

4

Generating all fields from an alias after a JOIN in Pig

I would like to perform the equivalent of "keep all a in A where a.field == b.field for some b in B" in Apache Pig. I am implementing it like so, AB_joined = JOIN A by field, B by field; A2 = FORE...

hadoop apache-pig

Nari asked 30/5, 2012 at 23:23

3

Solved

how to create a small constant relation(table) in pig?

is there a way to create a small constant relation(table) in pig? I need to create a relation with only 1 tuple that contains constant values. something along the lines of: A = LOAD using Constan...

apache-pig

Attribution asked 16/11, 2012 at 9:56

1

Solved

How to get the value for a variable key from a pig map?

Is there a way we can get the value of a map for variable keys using the field as the key? Eg : My company data has locale and name fields like this {"en_US", (["en_US" : "English Name"], ["fr_FR...

hadoop apache-pig

Leucoplast asked 12/2, 2017 at 20:15

8

Merging multiple files into one within Hadoop

I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do it using hadoof fs comm...

hadoop apache-pig

Dyad asked 23/8, 2010 at 13:59

3

IS it possible to manage NO FILE error in Pig?

I'm trying to load simple file: log = load 'file_1.gz' using TextLoader AS (line:chararray); dump log And I get an error: 2014-04-08 11:46:19,471 [main] ERROR org.apache.pig.tools.pigstats.Simp...

apache-pig

Assai asked 8/4, 2014 at 9:54

2

Solved

how to deploy and run oozie job?

I'm trying to do a simple job using oozie. It will be a one simple Pig Action. I have a file : FirstScript.pig containing: dual = LOAD 'default.dual' USING org.apache.hcatalog.pig.HCatLoader(); s...

hadoop apache-pig oozie

Samp asked 30/1, 2014 at 14:11

1

Apache Pig - nested FOREACH over same relation

I have a number of bags and I want to compute the pairwise similarities between the bags. sequences = FOREACH raw GENERATE gen_bag(logs); The relation is described as follows: sequences: {t: (...

python hadoop mapreduce apache-pig

Gissing asked 1/11, 2016 at 11:7

4

Apache Pig: FLATTEN and parallel execution of reducers

I have implemented an Apache Pig script. When I execute the script it results in many mappers for a specific step, but has only one reducer for that step. Because of this condition (many mappers, o...

hadoop apache-pig

Uniflorous asked 7/11, 2013 at 12:0

3

Solved

Getting an error on running HCatalog

A = LOAD 'eventnew.txt' USING HCatalogLoader(); 2015-07-08 19:56:34,875 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1070: Could not resolve HCatalogLoader using imports: [, java.lang....

hadoop hive apache-pig hcatalog

Gustav asked 8/7, 2015 at 14:40

2

Solved

Pig: is it possible to use pytz or dateutils for Python udfs?

I am using datetime in some Python udfs that I use in my pig script. So far so good. I use pig 12.0 on Cloudera 5.5 However, I also need to use the pytz or dateutil packages as well and they dont ...

python apache-pig jython cloudera pytz

Handcrafted asked 26/8, 2016 at 22:52

1

AWS EMR import external library from S3

I have setup a cluster using Amazon EMR. I have a python library (cloned from github and not available on pip) on S3. I want to submit a pig work that uses a udf which makes use of the library pre...

python amazon-web-services amazon-s3 apache-pig amazon-emr

Devlen asked 7/8, 2016 at 2:42

2

Apache Sqoop/Pig Consistent Data Representation/Processing

In our organization, we have been trying to use hadoop ecosystem based tools to implement ETLs lately. Although the ecosystem itself is quite big, we are using only a very limited set of tools at t...

apache-pig sqoop

Magnate asked 25/2, 2014 at 23:30

4

Computing median in map reduce

Can someone example the computation of median/quantiles in map reduce? My understanding of Datafu's median is that the 'n' mappers sort the data and send the data to "1" reducer which is respons...

hadoop statistics mapreduce apache-pig median

Astto asked 11/4, 2012 at 15:53

4

Getting NULL after create external table in Hive using parquet file as storage

I am creating external table in Hive using parquet file as a storage hive> CREATE EXTERNAL TABLE test_data( c1 string, c2 int, c3 string, c4 string, c5 string, c6 float, c7 string, c8 string,...

hadoop hive apache-pig

Maltzman asked 14/8, 2013 at 7:6

4

Solved

ERROR 1066: Unable to open iterator for alias - Pig

Just started Pig; trying to load the data from a file and dump it henceforth. Loading seems to be proper, no error is thrown. Below is the query: NYSE = LOAD '/root/Desktop/Works/NYSE-2000-2001....

apache-pig

Nilsanilsen asked 3/12, 2013 at 11:38

6

Solved

Error in pig while loading data

I am using ubuntu 12.02 32bit and have installed hadoop2.2.0 and pig 0.12 successfully. Hadoop runs properly on my system. However, whenever I run this command : data = load 'atoz.csv' using PigS...

hadoop apache-pig

Gaberones asked 23/1, 2014 at 6:4

1

Using hive table over parquet in Pig

I am trying to create a Hive table with schema string,string,double on a folder containing two Parquet files. The first parquet file schema is string,string,double and the schema of the second file...

hadoop hive apache-pig parquet hcatalog

Writeup asked 20/1, 2016 at 1:58

1

Pig 0.13 ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl

Just installed Pig 0.13 and I am attempting to use it with Hadoop 1.1.2. (Pig documentation states Pig 0.13 is compatible with Hadoop 1.1.2). Per the Pig install instructions, I set $PIG_CLASSPATH ...

hadoop apache-pig

Nazarene asked 3/8, 2014 at 18:23

2

Solved

Export from pig to CSV

I'm having a lot of trouble getting data out of pig and into a CSV that I can use in Excel or SQL (or R or SPSS etc etc) without a lot of manipulation ... I've tried using the following function: ...

excel csv apache-pig

Microgroove asked 4/12, 2012 at 4:21

5

Solved

Skipping the header while loading the text file using Piglatin

I have a text file and it's first row contains the header. Now I want to do some operation on the data, but while loading the file using PigStorage it takes the HEADER too. I just want to skip the ...

hadoop apache-pig

Maximomaximum asked 1/10, 2013 at 11:44

5

Solved

What is the difference between Apache Pig and Apache Hive?

What is the exact difference between Pig and Hive? I found that both have same functional meaning because they are used for doing same work. The only thing is implimentation which is different for ...

hadoop hive apache-pig

Meemeece asked 23/4, 2012 at 11:47

3

Pig Batch mode: how to set logging level to hide INFO log messages?

Using Apache Pig version 0.10.1.21 (rexported). When I execute a pig script, there are a lots of INFO logging lines which looks like that: 2013-05-18 14:30:12,810 [Thread-28] INFO org.apache.hado...

apache-pig

Dunleavy asked 18/5, 2013 at 18:45

7

Solved

How do I parse JSON in Pig?

I have a lot of gzip'd log files in s3 that has 3 types of log lines: b,c,i. i and c are both single level json: {"this":"that","test":"4"} Type b is deeply nested json. I came across this gist ...

json apache-pig

Bullbat asked 16/2, 2011 at 5:59

apache-pig Questions

Recommended topics

Hot tags