apache-pig Questions

1

Solved

This is my file: Col1, Col2, Col3, Col4, Col5 I need only Col2 and Col3. Currently I'm doing this: a = load 'input' as (Col1:chararray, Col2:chararray, Col3:chararray, Col4:chararray); b ...
Diplomacy asked 31/12, 2013 at 14:49

2

Solved

Currently, when I STORE into HDFS, it creates many part files. Is there any way to store out to a single CSV file?
Barrelhouse asked 28/3, 2012 at 15:34

2

Solved

When developing Pig scripts that use the STORE command I have to delete the output directory for every run or the script stops and offers: 2012-06-19 19:22:49,680 [main] ERROR org.apache.pig.tools...
Riff asked 19/6, 2012 at 22:28

2

Solved

I have a question about PIG Latin. Is there any way how to invoke some pig script from the other pig script? I know it is possible to run user defined functions (UDFs) like: REGISTER myudfs.jar; ...
Diminution asked 17/12, 2013 at 13:52

1

Solved

I want to add a new column to an alias, preserving all the existing ones. A = foreach A generate A.id as id, A.date as date, A.foo as foo, A.bar as bar, A.foo / A.bar as foobar; Can I d...
Whistle asked 11/12, 2013 at 19:42

2

For a file of the form A B user1 C D user2 A D user3 A D user1 I want to calculate the count of distinct values of field 3 i.e. count(distinct(user1, user2,user2,user1)) = 3 I am doing this usi...
Dipterous asked 15/10, 2012 at 11:25

2

I want some sort of unique identifier/line_number/counter to be generated/appended in my foreach construct while iterates through the records. Is there a way to accomplish this without writing a UD...
Pinhole asked 3/10, 2011 at 15:44

1

Solved

the manual/documentation uses the language of 'inner bag' and 'outer bag' extensively (say: http://pig.apache.org/docs/r0.11.1/basic.html ), and yet I haven't been able to pin out clearly the preci...
Dodgson asked 8/10, 2013 at 1:27

1

Solved

I'm new to pig, and I'm having an issue parsing my input and getting it into a format that I can use. The input file contains lines that have both fixed fields and KV pairs as follows: FF1|FF2|FF3...
Unlucky asked 27/9, 2013 at 18:0

1

Solved

I have a sample input as tab separated key, value pair as follows B_1001@2012-06-15 [email protected] B_1001@2012-06-18 [email protected] B_1002@2012-09-26 [email protected] B_1002@...
Sniperscope asked 26/9, 2013 at 14:26

2

Solved

Since there is no else or default statements in pig split operation what would be the most elegant way to do the following? I'm not a big fan of having to copy paste code. SPLIT rawish_data INTO ...
Grissel asked 20/9, 2013 at 9:51

1

Solved

Trying to get this done on Pig. (Looking for the group_concat() equivalent of MySQL) In my table, for example, I have this: (3fields- userid, clickcount,pagenumber) 155 | 2 | 12 155 | 3 | 133 155...
Jetsam asked 13/9, 2013 at 7:2

3

Iam looking to convert the ISO time format to yyyy-mm-dd hh:mm:ss.SSS. However Im not able achive the conversion. Iam new to pig and im trying to write a udf to handle the conversion from ISO forma...
Claudieclaudina asked 6/9, 2013 at 11:33

2

Solved

I am trying to get pig started and failing: $ pig 2013-05-10 18:03:22,972 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53 2013-05-10 18:03:22...
Layton asked 10/5, 2013 at 22:15

2

Solved

I am trying to take a logical match criteria like: (("Foo" OR "Foo Bar" OR FooBar) AND ("test" OR "testA" OR "TestB")) OR TestZ and apply this as a match against a file in pig using result = fi...
Senile asked 1/9, 2013 at 11:22

1

Solved

I am learning how to use Hadoop Pig now. If I have a input file like this: a,b,c,true s,c,v,false a,s,b,true ... The last field is the one I need to count... So I want to know how many 'true' a...
Giana asked 7/8, 2013 at 22:45

2

Solved

I have a dataset, A, that has timestamp, visitor, URL: (2012-07-21T14:00:00.000Z, joe, hxxp:///www.aaa.com) (2012-07-21T14:01:00.000Z, mary, hxxp://www.bbb.com) (2012-07-21T14:02:00.000Z, joe, ...
Plascencia asked 1/8, 2013 at 20:38

1

Solved

I have some problems understanding how the map should be used. Following this tutorial I created a file containing the following text: [open#apache] [apache#hadoop] The, I was able to load that...
Collenecollet asked 21/7, 2013 at 12:24

1

When I do Python UDF with Pig, how do we know which version of Python it is using? Is it possible to use a specific version of Python? Specifically my problem is in my UDF, I need to use a functio...
Ezekiel asked 17/7, 2013 at 22:34

2

I installed Hadoop and Pig using brew install hadoop and brew install pig. I read here that you will to get Unable to load realm info from SCDynamicStore error message unless you add: export HADO...
Clipping asked 5/2, 2013 at 21:8

2

Solved

I am trying to do a star schema type of join in pig and below is my code. When I join multiple relations with different columns, I have to prefix the name of the previous join every time to get it ...
Deictic asked 22/6, 2013 at 22:19

1

Solved

When I enter some erroneous command in a Pig interactive shell environment, it enters into listening mode (>>) like below. How do I safely come out of this command, but still stay in the pig ...
Anchusin asked 17/3, 2013 at 3:2

1

Solved

I have a feed in the following format: Hour Key ID Value 1 K1 001 3 1 K1 002 2 2 K1 005 4 1 K2 002 1 2 K2 003 5 2 K2 004 6 and I want to group the feed by (Hour, Key) then sum the Value bu...
Canoewood asked 19/6, 2013 at 8:33

1

I have a Pig Streaming job where the number of mappers should equal the number of rows/lines in the input file. I know that setting set mapred.min.split.size 16 set mapred.max.split.size 16 set ...
Unvoice asked 11/6, 2013 at 22:25

3

Solved

I have a folder of files created daily that all store the same type of information. I'd like to make a script that loads the newest 10 of them, UNIONs them, and then runs some other code on them. S...
Louanneloucks asked 7/9, 2011 at 20:38

© 2022 - 2024 — McMap. All rights reserved.