apache-pig - McMap

2

Solved

Counting elements for each group using Pig

I'm trying to group and count the frequency of terms for each group in PigLatin, but I'm having some troubles to figure it out how to do it. I have a collection of objects with the following sche...

apache-pig

Flowery asked 29/7, 2014 at 9:44

3

ERROR 1066: Unable to open iterator for alias in Pig, Generic solution

A very common, error message in Apache Pig is: ERROR 1066: Unable to open iterator for alias There are several questions where this error is mentioned, but none of them give a generic approach...

debugging apache-pig hortonworks-data-platform hdp

Eyla asked 28/12, 2015 at 14:9

17

Solved

When to use Hadoop, HBase, Hive and Pig?

What are the benefits of using either Hadoop or HBase or Hive ? From my understanding, HBase avoids using map-reduce and has a column oriented storage on top of HDFS. Hive is a sql-like interface ...

hadoop hbase hive apache-pig

Quicksand asked 17/12, 2012 at 9:33

3

Solved

Calculate Average using PIG

I am new to PIG and want to calculate Average of my one column data that looks like 0 10.1 20.1 30 40 50 60 70 80.1 I wrote this pig script dividends = load 'myfile.txt' as (A); dump dividends g...

hadoop apache-pig

Jurel asked 4/3, 2013 at 23:15

1

Solved

Add folder name to output Pig Latin

I have next directory structure in HDFS: logs_folder |---2021-03-01 |---log1 |---log2 |---log3 2021-03-02 |---log1 |---log2 2021-03-03 |---log1 |---log2 ... Logs are made up of text data...

date hadoop logging text apache-pig

Jason asked 30/3, 2021 at 15:23

4

Solved

finding mean using pig or hadoop

I have a huge text file of form data is saved in directory data/data1.txt, data2.txt and so on merchant_id, user_id, amount 1234, 9123, 299.2 1233, 9199, 203.2 1234, 0124, 230 and so on.. Wha...

hadoop apache-pig

Coincide asked 26/9, 2012 at 1:56

11

Solved

Pig Latin: Load multiple files from a date range (part of the directory structure)

I have the following scenario- Pig version used 0.70 Sample HDFS directory structure: /user/training/test/20100810/<data files> /user/training/test/20100811/<data files> /user/traini...

hadoop apache-pig

Snowblink asked 18/8, 2010 at 18:39

2

Solved

HIVE Creating Table not null

this is my query in DB2 Database: CREATE TABLE MY_TABLE (COD_SOC CHAR(5) NOT NULL); Is possible reproduce the 'NOT NULL' in HIVE? What about PIG?

hadoop db2 hive apache-pig

Streetcar asked 7/8, 2014 at 15:34

6

Solved

How to perform a DISTINCT in Pig Latin on a subset of columns?

I would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible with a nested foreach: You cannot use DISTINCT on a subset of fields; to do this,...

apache-pig

Incondite asked 25/9, 2013 at 22:39

4

storing pig output into Hive table in a single instance

I would like to insert the pig output into Hive tables(tables in Hive is already created with the exact schema).Just need to insert the output values into table. I dont want to the usual method, wh...

hadoop hive apache-pig

Spheroidicity asked 8/7, 2015 at 9:30

4

Solved

Pig keeps trying to connect to job history server (and fails)

I'm running a Pig job that fails to connect to the Hadoop job history server. The task (usually any task with GROUP BY) runs for a while and then it starts with a message like: 2015-04-21 19:05:2...

hadoop apache-pig

Simian asked 21/4, 2015 at 22:46

2

Solved

Run Bash script on GCP Dataproc

I want to run shell script on Dataproc which will execute my Pig scripts with arguments. These arguments are always dynamic and are calculated by shell script. Currently this scripts are running ...

apache-pig google-cloud-dataproc

Michaelamichaele asked 14/10, 2019 at 12:17

4

Solved

I have an Errno 13 Permission denied with subprocess in python

The line with the issue is ret=subprocess.call(shlex.split(cmd)) cmd = /usr/share/java -cp pig-hadoop-conf-Simpsons:lib/pig-0.8.1-cdh3u1-core.jar:lib/hadoop-core-0.20.2-cdh3u1.jar org.apache.pig...

python bash error-handling hadoop apache-pig

Bounty asked 2/10, 2012 at 13:36

3

Solved

Junit External Resource @Rule Order

I want to use multiple external resources in my test class, but I have a problem with ordering of external resources. Here is code snippet : public class TestPigExternalResource { // hadoop ex...

java hadoop junit apache-pig rule

Natiha asked 4/10, 2013 at 7:11

6

PIG: ERROR 1000: Error during parsing

I have installed Pig 0.12 in my machine. when I run darwin$ pig grunt> ls /data/ hdfs://Nmame:10001/data/pg20417.txt<r 3> 674570 hdfs://Nname:10001/data/pg4300.txt<r 3> 1573150 hdf...

hadoop apache-pig

Underthecounter asked 1/7, 2014 at 0:41

2

Solved

Hadoop Pig - Removing csv header

My csv files have header in the first line. Loading them into pig create a mess on any subsequent functions (like SUM). As of today I first apply a filter on the loaded data to remove the rows cont...

csv hadoop apache-pig

Passel asked 29/3, 2015 at 22:24

1

pig is not visible inside hue

I have a hadoop cluster. Pig is installed: But the pig editor is not visible inside hue (3.7): How can I fix it?

apache-pig cloudera-cdh hue

Starve asked 15/1, 2018 at 21:34

3

Solved

select count distinct using pig latin

I need help with this pig script. I am just getting a single record. I am selecting 2 columns and doing a count(distinct) on another while also using a where like clause to find a particular descri...

hadoop apache-pig

Prosaic asked 12/2, 2012 at 7:55

4

Solved

Connection Error in Apache Pig

I am running Apache Pig .11.1 with Hadoop 2.0.5. Most simple jobs that I run in Pig work perfectly fine. However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get t...

hadoop apache-pig

Conlon asked 29/7, 2013 at 17:42

6

Solved

Using IN clause with PIG FILTER

Does PIG support IN clause? filtered = FILTER bba BY reason not in ('a','b','c','d'); or should i split it up into multiple OR's? Thanks!

apache-pig

Presbyterian asked 24/8, 2011 at 16:45

5

How to change Tez job name when running query in HIVE

When I submit a Hive SQL using Tez like below: hive (default)> select count(*) from simple_data; In Resource Manager UI the job name shows something like HIVE-9d1906a2-25dd-4a7c-9ea3-bf651036...

hadoop hive apache-pig

Coke asked 29/10, 2015 at 19:14

7

GUI for using Hadoop [closed]

Is there an easy way to use Hadoop other than with the command line? Which tools are you using and which one is the best?

user-interface hadoop hive apache-pig hue

Fosse asked 12/7, 2013 at 4:43

8

Solved

How can I add row numbers for rows in PIG or HIVE?

I have a problem when adding row numbers using Apache Pig. The problem is that I have a STR_ID column and I want to add a ROW_NUM column for the data in STR_ID, which is the row number of the STR_I...

hadoop hive apache-pig

Finished asked 15/2, 2012 at 5:58

1

Solved

Pig: efficient filtering by loaded list

In Apache Pig (version 0.16.x), what are some of the most efficient methods to filter a dataset by an existing list of values for one of the dataset's fields? For example, (Updated per @inquisitiv...

apache-pig

Mccue asked 13/6, 2017 at 21:48

4

Solved

Hadoop Pig: Passing Command Line Arguments

Is there a way to do this? eg, pass the name of the file to be processed, etc?

hadoop apache-pig

Dissuade asked 12/11, 2010 at 15:29

apache-pig Questions

Recommended topics

Hot tags