apache-pig Questions

3

Solved

I'm trying to write a pig latin script to pull the count of a dataset that I've filtered. Here's the script so far: /* scans by title */ scans = LOAD '/hive/scans/*' USING PigStorage(',') AS (t...
Neutrality asked 22/3, 2012 at 16:19

4

Solved

I have a file in hdfs with 100 columns, which i want to proces using pig. I want to load this file into a tuple with columns names in a separate pig script, and reuse this script from other pig scr...
Siren asked 26/9, 2011 at 15:33

1

Solved

How can I check in piglatin, if a bag contains an element? Example : In a bag of chararray, how can I check if a token is present?
Thriller asked 15/10, 2014 at 19:9

6

Solved

I'm about to start playing around with PIG-latin, and I was hoping to get some text highlighting and such for it in Eclipse. Doing a quick Google search, I saw a couple of Eclipse plugins for it. A...
Artery asked 25/8, 2011 at 16:59

1

I wrote a program to execute a embeded Pig sentence in Java. I executed the java sentence registryQuery. But when I try on to store the result, I give a error of org.apache.hadoop.mapred.localClien...
Compost asked 7/6, 2014 at 11:48

2

Solved

I'm using CurrentTime(), which is a datetime data type. However, I need it as a chararray. I have the following: A = LOAD ... B = FOREACH A GENERATE CurrentTime() AS todaysDate; I've tried vario...
Orthodontist asked 29/5, 2013 at 16:37

3

I am using Pig (0.9.1) with UDFs written in Python. The Python scripts import modules from the standard Python library. I have been able to run the Pig scrips that call the Python UDFs successfully...
Vickievicksburg asked 20/10, 2011 at 5:47

1

Solved

Let's say I JOIN two relations like: -- part looks like: -- 1,5.3 -- 2,4.9 -- 3,4.9 -- original looks like: -- 1,Anju,3.6,IT,A,1.6,0.3 -- 2,Remya,3.3,EEE,B,1.6,0.3 -- 3,akhila,3.3,IT,C,1.3,0.3 j...
Hypoglycemia asked 20/4, 2014 at 5:13

3

Solved

I like to generate multiple tuples from a single tuple. What I mean is: I have file with following data in it. >> cat data ID | ColumnName1:Value1 | ColumnName2:Value2 so I load it by the...
Field asked 2/7, 2012 at 3:1

3

I am relatively new to PigScript. I would like to know if there is a way of passing parameters to Java UDFs in Pig? Here is the scenario: I have a log file which have different columns (each repre...
Piracy asked 31/10, 2012 at 17:38

1

Solved

I understood Group didn't work with multiple tuples and hence we had COGROUP in PIG. However, while checking today the GROUP command works for me. I am using PIG-0.12.0. My commands and outputs are...
Allbee asked 30/7, 2014 at 4:9

3

Solved

Can Some one explain me on getting this below output in Pigscript my input file is below a.txt aaa.kyl,data,data bbb.kkk,data,data cccccc.hj,data,data qa.dff,data,data I am writing the pig sc...
Ebberta asked 27/7, 2014 at 13:26

2

Solved

I can control the number of reducers by using PARALLEL clause in the statements which result in reducers. I want to control the number of mappers. The data source is already created, and I can not...
Magalymagan asked 16/6, 2014 at 7:13

1

I'm having a problem using the python library simplejson in jython to write a Pig UDF. I need because jython-standalone-2.5.2.jar doesn't come with a JSON library. I'm using Apache Pig version 0.11...
Deutzia asked 1/2, 2014 at 21:42

5

Let's say I have a data set of restaurant reviews: User,City,Restaurant,Rating Jim,New York,Mecurials,3 Jim,New York,Whapme,4.5 Jim,London,Pint Size,2 Lisa,London,Pint Size,4 Lisa,London,Rabbit Wh...
Swedenborgian asked 8/2, 2011 at 11:53

3

Solved

From this: (1, {(1,2), (1,3), (1,4)} ) (2, {(2,5), (2,6), (2,7)} ) ...How could we generate this? ((1,2),(1,3),(1,4)) ((2,5),(2,6),(2,7)) ...And how could we generate this? (1, 2, 3, 4) (2, ...
Interstitial asked 31/8, 2013 at 4:48

2

I am currently debugging a pig script. I'd like to define a tuple in the Pig file directly (instead of the basic "Load" function). Is there a way to do it? I am looking for something like that: ...
Manual asked 14/9, 2012 at 11:14

2

Solved

I have data that's already grouped and aggregated, it looks like so: user value count ---- -------- ------ Alice third 5 Alice first 11 Alice second 10 Alice fourth 2 ... Bob second 20 Bob third 1...
Clamp asked 15/7, 2013 at 13:56

2

Solved

Am using Pig 0.11.0 rank function and generating ranks for every id in my data. I need ranking of my data in a particular way. I want the rank to reset and start from 1 for every new ID. Is it pos...
Standardbearer asked 10/4, 2014 at 11:42

2

Solved

I have a data file and a corresponding schema file stored in separate locations. I would like to load the data using the schema in the schema-file. I tried using A= LOAD '<file path>' USING ...
Gallinule asked 24/11, 2013 at 10:6

1

Solved

I'm loading a tsv file with a datetime column and long column with: A = LOAD 'tweets-clean.txt' USING PigStorage('\t') AS (date:datetime, userid:long); DUMP A; An example line of input: Tue Feb...
Ainslee asked 26/2, 2014 at 20:31

1

I am trying to edit a large file on Hadoop cluster and trim white spaces and special characters like ¦,*,@," etc from the file. I dont want to copyToLocal and use a sed as i have 1000's of such fil...
Fortunetelling asked 20/2, 2014 at 19:28

6

I installed Hadoop (1.0.2) for a single node on Windows 7 with Cygwin, and it is working. However, I cannot get PIG (0.10.0) to see the Hadoop. 1) "Error: JAVA_HOME is not set." I added this lin...
Propeller asked 13/7, 2012 at 11:46

1

Solved

First of all I am relatively new to Big Data and the Hadoop world and I have just started to experiment a little with the Hortonworks Sandbox (Pig and Hive so far). I was wondering in which c...
Ambush asked 29/1, 2014 at 18:2

0

Versions: Hadoop 2.2, Hbase 0.96.1, Pig 0.12 Whenever I run this pig script raw_data = LOAD 'sample_data.csv' USING PigStorage( ',' ) AS ( listing_id: chararray, fname: chararray, lname: chara...
Bindweed asked 15/1, 2014 at 13:23

© 2022 - 2024 — McMap. All rights reserved.