apache-pig Questions
3
Solved
I'm trying to write a pig latin script to pull the count of a dataset that I've filtered.
Here's the script so far:
/* scans by title */
scans = LOAD '/hive/scans/*' USING PigStorage(',') AS (t...
Neutrality asked 22/3, 2012 at 16:19
4
Solved
I have a file in hdfs with 100 columns, which i want to proces using pig. I want to load this file into a tuple with columns names in a separate pig script, and reuse this script from other pig scr...
Siren asked 26/9, 2011 at 15:33
1
Solved
How can I check in piglatin, if a bag contains an element?
Example : In a bag of chararray, how can I check if a token is present?
Thriller asked 15/10, 2014 at 19:9
6
Solved
I'm about to start playing around with PIG-latin, and I was hoping to get some text highlighting and such for it in Eclipse. Doing a quick Google search, I saw a couple of Eclipse plugins for it. A...
Artery asked 25/8, 2011 at 16:59
1
I wrote a program to execute a embeded Pig sentence in Java. I executed the java sentence registryQuery. But when I try on to store the result, I give a error of org.apache.hadoop.mapred.localClien...
Compost asked 7/6, 2014 at 11:48
2
Solved
I'm using CurrentTime(), which is a datetime data type. However, I need it as a chararray. I have the following:
A = LOAD ...
B = FOREACH A GENERATE CurrentTime() AS todaysDate;
I've tried vario...
Orthodontist asked 29/5, 2013 at 16:37
3
I am using Pig (0.9.1) with UDFs written in Python. The Python scripts import modules from the standard Python library. I have been able to run the Pig scrips that call the Python UDFs successfully...
Vickievicksburg asked 20/10, 2011 at 5:47
1
Solved
Let's say I JOIN two relations like:
-- part looks like:
-- 1,5.3
-- 2,4.9
-- 3,4.9
-- original looks like:
-- 1,Anju,3.6,IT,A,1.6,0.3
-- 2,Remya,3.3,EEE,B,1.6,0.3
-- 3,akhila,3.3,IT,C,1.3,0.3
j...
Hypoglycemia asked 20/4, 2014 at 5:13
3
Solved
I like to generate multiple tuples from a single tuple. What I mean is:
I have file with following data in it.
>> cat data
ID | ColumnName1:Value1 | ColumnName2:Value2
so I load it by the...
Field asked 2/7, 2012 at 3:1
3
I am relatively new to PigScript. I would like to know if there is a way of passing parameters to Java UDFs in Pig?
Here is the scenario:
I have a log file which have different columns (each repre...
Piracy asked 31/10, 2012 at 17:38
1
Solved
I understood Group didn't work with multiple tuples and hence we had COGROUP in PIG. However, while checking today the GROUP command works for me. I am using PIG-0.12.0.
My commands and outputs are...
Allbee asked 30/7, 2014 at 4:9
3
Solved
Can Some one explain me on getting this below output in Pigscript
my input file is below
a.txt
aaa.kyl,data,data
bbb.kkk,data,data
cccccc.hj,data,data
qa.dff,data,data
I am writing the pig sc...
Ebberta asked 27/7, 2014 at 13:26
2
Solved
I can control the number of reducers by using PARALLEL clause in the statements which result in reducers.
I want to control the number of mappers. The data source is already created, and I can not...
Magalymagan asked 16/6, 2014 at 7:13
1
I'm having a problem using the python library simplejson in jython to write a Pig UDF. I need because jython-standalone-2.5.2.jar doesn't come with a JSON library. I'm using Apache Pig version 0.11...
Deutzia asked 1/2, 2014 at 21:42
5
Let's say I have a data set of restaurant reviews:
User,City,Restaurant,Rating
Jim,New York,Mecurials,3
Jim,New York,Whapme,4.5
Jim,London,Pint Size,2
Lisa,London,Pint Size,4
Lisa,London,Rabbit Wh...
Swedenborgian asked 8/2, 2011 at 11:53
3
Solved
From this:
(1, {(1,2), (1,3), (1,4)} )
(2, {(2,5), (2,6), (2,7)} )
...How could we generate this?
((1,2),(1,3),(1,4))
((2,5),(2,6),(2,7))
...And how could we generate this?
(1, 2, 3, 4)
(2, ...
Interstitial asked 31/8, 2013 at 4:48
2
I am currently debugging a pig script. I'd like to define a tuple in the Pig file directly (instead of the basic "Load" function).
Is there a way to do it?
I am looking for something like that:
...
Manual asked 14/9, 2012 at 11:14
2
Solved
I have data that's already grouped and aggregated, it looks like so:
user value count
---- -------- ------
Alice third 5
Alice first 11
Alice second 10
Alice fourth 2
...
Bob second 20
Bob third 1...
Clamp asked 15/7, 2013 at 13:56
2
Solved
Am using Pig 0.11.0 rank function and generating ranks for every id in my data.
I need ranking of my data in a particular way. I want the rank to reset and start from 1 for every new ID.
Is it pos...
Standardbearer asked 10/4, 2014 at 11:42
2
Solved
I have a data file and a corresponding schema file stored in separate locations.
I would like to load the data using the schema in the schema-file. I tried using
A= LOAD '<file path>' USING ...
Gallinule asked 24/11, 2013 at 10:6
1
Solved
I'm loading a tsv file with a datetime column and long column with:
A = LOAD 'tweets-clean.txt' USING PigStorage('\t') AS (date:datetime, userid:long);
DUMP A;
An example line of input:
Tue Feb...
Ainslee asked 26/2, 2014 at 20:31
1
I am trying to edit a large file on Hadoop cluster and trim white spaces and special characters like ¦,*,@," etc from the file.
I dont want to copyToLocal and use a sed as i have 1000's of such fil...
Fortunetelling asked 20/2, 2014 at 19:28
6
I installed Hadoop (1.0.2) for a single node on Windows 7 with Cygwin, and it is working. However, I cannot get PIG (0.10.0) to see the Hadoop.
1) "Error: JAVA_HOME is not set."
I added this lin...
Propeller asked 13/7, 2012 at 11:46
1
Solved
First of all I am relatively new to Big Data and the Hadoop world and I have just started to experiment a little with the Hortonworks Sandbox (Pig and Hive so far).
I was wondering in which c...
Ambush asked 29/1, 2014 at 18:2
0
Versions: Hadoop 2.2, Hbase 0.96.1, Pig 0.12
Whenever I run this pig script
raw_data = LOAD 'sample_data.csv' USING PigStorage( ',' ) AS (
listing_id: chararray, fname: chararray, lname: chara...
Bindweed asked 15/1, 2014 at 13:23
© 2022 - 2024 — McMap. All rights reserved.