mapreduce Questions
5
Solved
Hey I'm fairly new to the world of Big Data.
I came across this tutorial on
http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/
It describes in detail of how to run...
Hsining asked 11/6, 2013 at 5:50
4
Solved
I'm a newbie in Hadoop. I'm trying out the Wordcount program.
Now to try out multiple output files, i use MultipleOutputFormat. this link helped me in doing it. http://hadoop.apache.org/common/do...
10
Solved
I'm trying to run small spark application and am getting the following exception:
Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.&...
Pinnacle asked 5/4, 2016 at 13:7
11
Solved
I am getting:
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
While trying to make a copy of a partitioned table using the commands in the hive console:
CR...
9
Solved
I have 3 data nodes running, while running a job i am getting the following given below error ,
java.io.IOException: File /user/ashsshar/olhcache/loaderMap9b663bd9 could only be replicated to 0 ...
15
Solved
I commonly work with text files of ~20 Gb size and I find myself counting the number of lines in a given file very often.
The way I do it now it's just cat fname | wc -l, and it takes very long. I...
3
Solved
Is there something sys.minint in python similar to sys.maxint ?
5
Solved
Recently I'm trying OpenCV out for my graduation project.
I've had some success under Windows enviroment. And because with Windows package of OpenCV it comes with pre-built libraries, so I don't ha...
4
Solved
We are migrating from Redshift to Spark. I have a table in Redshift that I need to export to S3. From S3 this will be fed to Apache Spark (EMR).
I found there is only one way to export data from ...
Mosera asked 25/10, 2016 at 10:27
8
Solved
My map tasks need some configuration data, which I would like to distribute via the Distributed Cache.
The Hadoop MapReduce Tutorial shows the usage of the DistributedCache class, roughly as follo...
3
Solved
i have a problem with Hadoop mapreduce in R, and in the logs i did find this :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
log4j:WARN Please initialize the lo...
2
Solved
I want to print each step of my "map" after its execution on the console.
Something like
System.out.println("Completed Step one");
System.out.println("Completed Step two");
and so on
Is there ...
4
I'm using pyMongo 1.11 and MongoDB 1.8.2. I'm trying to do a fairly complex Map/Reduce. I prototyped the functions in Mongo and got it working, but when I tried transferring it to Python, I get:
-...
10
Solved
I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ..., Vn]). I feel like I should be able to do ...
Egmont asked 18/11, 2014 at 19:15
2
Solved
I'm developing a simple financial app for keeping track of incomes and outcomes.
For the sake of simplicity, let's suppose these are some of my documents:
{ description: "test1", amount: ...
Armyn asked 17/1, 2015 at 1:6
4
Solved
Heres the Hadoop word count java map and reduce source code:
In the map function, I've gotten to where I can output all the word that starts with the letter "c" and also the total number of times ...
4
Solved
I ran a wordcount example using Mapreduce the first time, and it worked. Then, I stopped the cluster, started it back in a while, and followed the same procedure.
Showed this error:
10P:/$ hado...
5
Solved
I want to debug a mapreduce script, and without going into much trouble tried to put some print statements in my program. But I cant seem to find them in any of the logs.
5
Solved
I am trying to write a function that will produce the factorial of a provided integer and then reduce the factorial array (by multiplying each array element).
For example:
factor(5) >>> [1, 2, 3,...
Sewel asked 3/3, 2016 at 19:5
2
Solved
As question, How to get the taskID or mapperID(something like partitionID in Spark) in a hive UDF ?
Deliladelilah asked 22/6, 2021 at 7:22
3
Solved
Output files generated via the Spark SQL DataFrame.write() method begin with the "part" basename prefix. e.g.
DataFrame sample_07 = hiveContext.table("sample_07");
sample_07.write().parquet("sampl...
Vitkun asked 19/3, 2016 at 21:46
2
Solved
I am trying to pass a variable (not property) using -D command line option in hadoop like -Dmapred.mapper.mystring=somexyz. I am able to set a conf property in Driver program and read it back in ma...
3
Solved
I understood how MRv1 works.Now I am trying to understand MRv2.. what's the difference between Application Manager and Application Master in YARN?
Kadiyevka asked 21/6, 2015 at 17:19
2
Solved
Question to all Cassandra experts out there.
I have a column family with about a million records.
I would like to query these records in such a way that I should be able to perform a Not-Equal-To...
8
Solved
I'd like to find out good and robust MapReduce framework, to be utilized from Scala.
Osmo asked 7/6, 2009 at 15:14
© 2022 - 2025 — McMap. All rights reserved.