apache-pig Questions

1

Solved

Consider the below relation test = LOAD 'input' USING PigStorage(',') as (a:chararray, b:chararray); Is there a way to achieve the following if (b == 1) { a = 'abc'; else if (b == 2) { a = '...
Upheaval asked 7/6, 2013 at 2:11

3

Solved

I've been doing some investigation lately around using Hadoop, Hive, and Pig to do some data transformation. As part of that I've noticed that the schema of data files doesn't seem to attached to f...
Martinmas asked 30/5, 2013 at 17:46

2

Solved

Using Apache Pig version 0.10.1.21 (reported), CentOS release 6.3 (Final), jdk1.6.0_31 (The Hortonworks Sandbox v1.2 on Virtualbox, with 3.5 GB RAM) $ cat data.txt 11,11,22 33,34,35 47,0,21 33,6,5...
Determine asked 11/5, 2013 at 16:36

1

Solved

I ran into an issue loading a set json documents into PIG. What I have is a lot of json documents that all vary in the fields they have, the fields that I need are in most documents and in whare mi...
Ophiology asked 13/3, 2013 at 21:11

2

Solved

If one have data like those: A = LOAD 'data' AS (a1:int,a2:int,a3:int); DUMP A; (1,2,3) (4,2,1) And then a cross-join is done on A, A: B = CROSS A, A; DUMP B; (1,2,3) (4,2,1) Why is second ...
Nitaniter asked 6/3, 2013 at 19:48

1

I have a Pig script which generated a relation A: {x: chararray,B: {(y: chararray,z: int)}} I want to sort A based on B.y, however the following piece gives me error: Syntax error, unexpecte...
Trioecious asked 28/2, 2013 at 19:39

2

Solved

How do I find the MAX of a tuple in Pig? My code looks like this: A,20 B,10 C,40 D,5 data = LOAD 'myData.txt' USING PigStorage(',') AS key, value; all = GROUP data ALL; maxKey = FOREACH all GENE...
Apologete asked 27/12, 2012 at 14:6

1

Solved

I am using CDH4 in a pseudo-distributed mode and I have some trouble working with HBase and Pig together (but both work fine alone). I am following step by step this nice tutorial: http://blog.whi...
Hibernaculum asked 18/1, 2013 at 15:33

1

Solved

I want to find if a string contains another string in Pig. I found that there is a built-in index function, but it only searches for characters not strings. Is there any other alternative?
Implausibility asked 20/12, 2012 at 10:1

2

Solved

I wonder if it's possible to pivot a table in one pass in Apache Pig. Input: Id Column1 Column2 Column3 1 Row11 Row12 Row13 2 Row21 Row22 Row23 Output: Id Name Value 1 Column1 Row11 1 Column2 ...
Milk asked 26/6, 2012 at 18:18

3

Solved

I'm new to Pig and trying to correctly implement a somewhat common algorithm in which I need to pair every matching record in a set of records. In order to distill the question into its simplest fo...
Alit asked 2/12, 2012 at 14:9

2

A common pattern in my data processing is to group by some set of columns, apply a filter, then flatten again. For example: my_data_grouped = group my_data by some_column; my_data_grouped = filter...
Odor asked 11/6, 2012 at 22:42

4

It looks like a silly problem, but I can´t find a way to filter null values from my rows. This is the result when I dump the object geoinfo: DUMP geoinfo; ([longitude#70.95853,latitude#30.9773...
Busiek asked 31/10, 2012 at 18:26

2

I recently ran into a case where Cassandra fits in perfectly to store time based events with custom ttls per event type (the other solution would be to save it in hadoop and do the bookkeeping manu...
Bonsai asked 1/11, 2012 at 9:45

1

Solved

I have this code in Pig (win, request and response are just tables loaded directly from filesystem): win_request = JOIN win BY bid_id, request BY bid_id; win_request_response = JOIN win_request BY...
Hoi asked 30/10, 2012 at 18:52

5

Solved

Is there a way to export the results from Pig directly to a database like mysql?
Ernie asked 10/1, 2011 at 16:11

1

Solved

I am new to pigscript. Say, We have a file [a#1,b#2,c#3] [a#4,b#5,c#6] [a#7,b#8,c#9] pig script A = LOAD 'txt' AS (in: map[]); B = FOREACH A GENERATE in#'a'; DUMP B; We know that we can take ...
Cascade asked 18/9, 2012 at 12:21

2

Solved

I have files that are named part-r-000[0-9][0-9] and that contain tab separated fields. I can view them using hadoop fs -text part-r-00000 but can't get them loaded using pig. What I've tried: x ...
Goodden asked 5/9, 2012 at 17:34

1

Solved

i recently meet this problem in my work, it's about pig flatten. i use a simple example to express it two files ===file1=== 1_a 2_b 4_d ===file2 (tab seperated)=== 1 a 2 b 3 c pig script 1: a ...
Sharyl asked 31/8, 2012 at 10:38

1

Example: I have a relation "class", with a nested bag of students: class: {teacher_name: chararray,students: {(firstname: chararray, lastname: chararray)} I want to perform an operation on each...
Liable asked 24/8, 2012 at 9:18

2

Solved

I do outer joins on single columns in Pig like this result = JOIN A by id LEFT OUTER, B by id; How do I join on two columns, something like - WHERE A.id=B.id AND A.name=B.name What is the pig...
Colotomy asked 7/11, 2011 at 15:45

2

Solved

I'm using PigLatin to filter some records. User1 8 NYC User1 9 NYC User1 7 LA User2 4 NYC User2 3 DC The script should remove the duplicate for users, and keep one of these records. Somethin...
Planetstruck asked 18/7, 2012 at 3:50

2

Solved

I am new to Hadoop/PIG. I have a basic question. Do we have a Logging facility in PIG UDF? I have written a UDF which I need to verify I need to log certain statements to check the flow. Is there...
Fennec asked 12/6, 2012 at 21:17

2

Solved

I have a PIG Script which produces four results I want to store all of them in a single file. I tries using UNION, however when I use UNION I get four files part-m-00000, part-m-00001, part-m-0000...
Aspirant asked 8/6, 2012 at 19:20

1

Everyone know that Pig have supported DBStorage, but they are only supported load results from Pig to mysql like that STORE data INTO DBStorage('com.mysql.jdbc.Driver', 'dbc:mysql://host/db', 'I...
Ortegal asked 8/6, 2012 at 3:30

© 2022 - 2024 — McMap. All rights reserved.