apache-pig Questions
1
Solved
This is my file:
Col1, Col2, Col3, Col4, Col5
I need only Col2 and Col3.
Currently I'm doing this:
a = load 'input' as (Col1:chararray,
Col2:chararray,
Col3:chararray,
Col4:chararray);
b ...
Diplomacy asked 31/12, 2013 at 14:49
2
Solved
Currently, when I STORE into HDFS, it creates many part files.
Is there any way to store out to a single CSV file?
Barrelhouse asked 28/3, 2012 at 15:34
2
Solved
When developing Pig scripts that use the STORE command I have to delete the output directory for every run or the script stops and offers:
2012-06-19 19:22:49,680 [main] ERROR org.apache.pig.tools...
Riff asked 19/6, 2012 at 22:28
2
Solved
I have a question about PIG Latin. Is there any way how to invoke some pig script from the other pig script?
I know it is possible to run user defined functions (UDFs) like:
REGISTER myudfs.jar;
...
Diminution asked 17/12, 2013 at 13:52
1
Solved
I want to add a new column to an alias, preserving all the existing ones.
A = foreach A generate
A.id as id,
A.date as date,
A.foo as foo,
A.bar as bar,
A.foo / A.bar as foobar;
Can I d...
Whistle asked 11/12, 2013 at 19:42
2
For a file of the form
A B user1
C D user2
A D user3
A D user1
I want to calculate the count of distinct values of field 3 i.e. count(distinct(user1, user2,user2,user1)) = 3
I am doing this usi...
Dipterous asked 15/10, 2012 at 11:25
2
I want some sort of unique identifier/line_number/counter to be generated/appended in my foreach construct while iterates through the records. Is there a way to accomplish this without writing a UD...
Pinhole asked 3/10, 2011 at 15:44
1
Solved
the manual/documentation uses the language of 'inner bag' and 'outer bag' extensively (say: http://pig.apache.org/docs/r0.11.1/basic.html ), and yet I haven't been able to pin out clearly the preci...
Dodgson asked 8/10, 2013 at 1:27
1
Solved
I'm new to pig, and I'm having an issue parsing my input and getting it into a format that I can use. The input file contains lines that have both fixed fields and KV pairs as follows:
FF1|FF2|FF3...
Unlucky asked 27/9, 2013 at 18:0
1
Solved
I have a sample input as tab separated key, value pair as follows
B_1001@2012-06-15 [email protected]
B_1001@2012-06-18 [email protected]
B_1002@2012-09-26 [email protected]
B_1002@...
Sniperscope asked 26/9, 2013 at 14:26
2
Solved
Since there is no else or default statements in pig split operation what would be the most elegant way to do the following? I'm not a big fan of having to copy paste code.
SPLIT rawish_data
INTO ...
Grissel asked 20/9, 2013 at 9:51
1
Solved
Trying to get this done on Pig. (Looking for the group_concat() equivalent of MySQL)
In my table, for example, I have this: (3fields- userid, clickcount,pagenumber)
155 | 2 | 12
155 | 3 | 133
155...
Jetsam asked 13/9, 2013 at 7:2
3
Iam looking to convert the ISO time format to yyyy-mm-dd hh:mm:ss.SSS. However Im not able achive the conversion. Iam new to pig and im trying to write a udf to handle the conversion from ISO forma...
Claudieclaudina asked 6/9, 2013 at 11:33
2
Solved
I am trying to get pig started and failing:
$ pig
2013-05-10 18:03:22,972 [main] INFO org.apache.pig.Main - Apache Pig version 0.11.1 (r1459641) compiled Mar 22 2013, 02:13:53
2013-05-10 18:03:22...
Layton asked 10/5, 2013 at 22:15
2
Solved
I am trying to take a logical match criteria like:
(("Foo" OR "Foo Bar" OR FooBar) AND ("test" OR "testA" OR "TestB")) OR TestZ
and apply this as a match against a file in pig using
result = fi...
Senile asked 1/9, 2013 at 11:22
1
Solved
I am learning how to use Hadoop Pig now.
If I have a input file like this:
a,b,c,true
s,c,v,false
a,s,b,true
...
The last field is the one I need to count... So I want to know how many 'true' a...
Giana asked 7/8, 2013 at 22:45
2
Solved
I have a dataset, A, that has timestamp, visitor, URL:
(2012-07-21T14:00:00.000Z, joe, hxxp:///www.aaa.com)
(2012-07-21T14:01:00.000Z, mary, hxxp://www.bbb.com)
(2012-07-21T14:02:00.000Z, joe, ...
Plascencia asked 1/8, 2013 at 20:38
1
Solved
I have some problems understanding how the map should be used.
Following this tutorial I created a file containing the following text:
[open#apache]
[apache#hadoop]
The, I was able to load that...
Collenecollet asked 21/7, 2013 at 12:24
1
When I do Python UDF with Pig, how do we know which version of Python it is using? Is it possible to use a specific version of Python?
Specifically my problem is in my UDF, I need to use a functio...
Ezekiel asked 17/7, 2013 at 22:34
2
I installed Hadoop and Pig using brew install hadoop and brew install pig.
I read here that you will to get Unable to load realm info from SCDynamicStore error message unless you add:
export HADO...
Clipping asked 5/2, 2013 at 21:8
2
Solved
I am trying to do a star schema type of join in pig and below is my code. When I join multiple relations with different columns, I have to prefix the name of the previous join every time to get it ...
Deictic asked 22/6, 2013 at 22:19
1
Solved
When I enter some erroneous command in a Pig interactive shell environment, it enters into listening mode (>>) like below. How do I safely come out of this command, but still stay in the pig ...
Anchusin asked 17/3, 2013 at 3:2
1
Solved
I have a feed in the following format:
Hour Key ID Value
1 K1 001 3
1 K1 002 2
2 K1 005 4
1 K2 002 1
2 K2 003 5
2 K2 004 6
and I want to group the feed by (Hour, Key) then sum the Value bu...
Canoewood asked 19/6, 2013 at 8:33
1
I have a Pig Streaming job where the number of mappers should equal the number of rows/lines in the input file. I know that setting
set mapred.min.split.size 16
set mapred.max.split.size 16
set ...
Unvoice asked 11/6, 2013 at 22:25
3
Solved
I have a folder of files created daily that all store the same type of information. I'd like to make a script that loads the newest 10 of them, UNIONs them, and then runs some other code on them. S...
Louanneloucks asked 7/9, 2011 at 20:38
© 2022 - 2024 — McMap. All rights reserved.