apache-pig Questions

2

Solved

I'm trying to group and count the frequency of terms for each group in PigLatin, but I'm having some troubles to figure it out how to do it. I have a collection of objects with the following sche...
Flowery asked 29/7, 2014 at 9:44

3

A very common, error message in Apache Pig is: ERROR 1066: Unable to open iterator for alias There are several questions where this error is mentioned, but none of them give a generic approach...
Eyla asked 28/12, 2015 at 14:9

17

Solved

What are the benefits of using either Hadoop or HBase or Hive ? From my understanding, HBase avoids using map-reduce and has a column oriented storage on top of HDFS. Hive is a sql-like interface ...
Quicksand asked 17/12, 2012 at 9:33

3

Solved

I am new to PIG and want to calculate Average of my one column data that looks like 0 10.1 20.1 30 40 50 60 70 80.1 I wrote this pig script dividends = load 'myfile.txt' as (A); dump dividends g...
Jurel asked 4/3, 2013 at 23:15

1

Solved

I have next directory structure in HDFS: logs_folder |---2021-03-01 |---log1 |---log2 |---log3 2021-03-02 |---log1 |---log2 2021-03-03 |---log1 |---log2 ... Logs are made up of text data...
Jason asked 30/3, 2021 at 15:23

4

Solved

I have a huge text file of form data is saved in directory data/data1.txt, data2.txt and so on merchant_id, user_id, amount 1234, 9123, 299.2 1233, 9199, 203.2 1234, 0124, 230 and so on.. Wha...
Coincide asked 26/9, 2012 at 1:56

11

Solved

I have the following scenario- Pig version used 0.70 Sample HDFS directory structure: /user/training/test/20100810/<data files> /user/training/test/20100811/<data files> /user/traini...
Snowblink asked 18/8, 2010 at 18:39

2

Solved

this is my query in DB2 Database: CREATE TABLE MY_TABLE (COD_SOC CHAR(5) NOT NULL); Is possible reproduce the 'NOT NULL' in HIVE? What about PIG?
Streetcar asked 7/8, 2014 at 15:34

6

Solved

I would like to perform a DISTINCT operation on a subset of the columns. The documentation says this is possible with a nested foreach: You cannot use DISTINCT on a subset of fields; to do this,...
Incondite asked 25/9, 2013 at 22:39

4

I would like to insert the pig output into Hive tables(tables in Hive is already created with the exact schema).Just need to insert the output values into table. I dont want to the usual method, wh...
Spheroidicity asked 8/7, 2015 at 9:30

4

Solved

I'm running a Pig job that fails to connect to the Hadoop job history server. The task (usually any task with GROUP BY) runs for a while and then it starts with a message like: 2015-04-21 19:05:2...
Simian asked 21/4, 2015 at 22:46

2

Solved

I want to run shell script on Dataproc which will execute my Pig scripts with arguments. These arguments are always dynamic and are calculated by shell script. Currently this scripts are running ...
Michaelamichaele asked 14/10, 2019 at 12:17

4

Solved

The line with the issue is ret=subprocess.call(shlex.split(cmd)) cmd = /usr/share/java -cp pig-hadoop-conf-Simpsons:lib/pig-0.8.1-cdh3u1-core.jar:lib/hadoop-core-0.20.2-cdh3u1.jar org.apache.pig...
Bounty asked 2/10, 2012 at 13:36

3

Solved

I want to use multiple external resources in my test class, but I have a problem with ordering of external resources. Here is code snippet : public class TestPigExternalResource { // hadoop ex...
Natiha asked 4/10, 2013 at 7:11

6

I have installed Pig 0.12 in my machine. when I run darwin$ pig grunt> ls /data/ hdfs://Nmame:10001/data/pg20417.txt<r 3> 674570 hdfs://Nname:10001/data/pg4300.txt<r 3> 1573150 hdf...
Underthecounter asked 1/7, 2014 at 0:41

2

Solved

My csv files have header in the first line. Loading them into pig create a mess on any subsequent functions (like SUM). As of today I first apply a filter on the loaded data to remove the rows cont...
Passel asked 29/3, 2015 at 22:24

1

I have a hadoop cluster. Pig is installed: But the pig editor is not visible inside hue (3.7): How can I fix it?
Starve asked 15/1, 2018 at 21:34

3

Solved

I need help with this pig script. I am just getting a single record. I am selecting 2 columns and doing a count(distinct) on another while also using a where like clause to find a particular descri...
Prosaic asked 12/2, 2012 at 7:55

4

Solved

I am running Apache Pig .11.1 with Hadoop 2.0.5. Most simple jobs that I run in Pig work perfectly fine. However, whenever I try to use GROUP BY on a large dataset, or the LIMIT operator, I get t...
Conlon asked 29/7, 2013 at 17:42

6

Solved

Does PIG support IN clause? filtered = FILTER bba BY reason not in ('a','b','c','d'); or should i split it up into multiple OR's? Thanks!
Presbyterian asked 24/8, 2011 at 16:45

5

When I submit a Hive SQL using Tez like below: hive (default)> select count(*) from simple_data; In Resource Manager UI the job name shows something like HIVE-9d1906a2-25dd-4a7c-9ea3-bf651036...
Coke asked 29/10, 2015 at 19:14

7

Is there an easy way to use Hadoop other than with the command line? Which tools are you using and which one is the best?
Fosse asked 12/7, 2013 at 4:43

8

Solved

I have a problem when adding row numbers using Apache Pig. The problem is that I have a STR_ID column and I want to add a ROW_NUM column for the data in STR_ID, which is the row number of the STR_I...
Finished asked 15/2, 2012 at 5:58

1

Solved

In Apache Pig (version 0.16.x), what are some of the most efficient methods to filter a dataset by an existing list of values for one of the dataset's fields? For example, (Updated per @inquisitiv...
Mccue asked 13/6, 2017 at 21:48

4

Solved

Is there a way to do this? eg, pass the name of the file to be processed, etc?
Dissuade asked 12/11, 2010 at 15:29

© 2022 - 2024 — McMap. All rights reserved.