mapreduce Questions

5

Solved

Hey I'm fairly new to the world of Big Data. I came across this tutorial on http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/ It describes in detail of how to run...
Hsining asked 11/6, 2013 at 5:50

4

Solved

I'm a newbie in Hadoop. I'm trying out the Wordcount program. Now to try out multiple output files, i use MultipleOutputFormat. this link helped me in doing it. http://hadoop.apache.org/common/do...
Snowdrop asked 16/8, 2010 at 6:42

10

Solved

I'm trying to run small spark application and am getting the following exception: Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.&...
Pinnacle asked 5/4, 2016 at 13:7

11

Solved

I am getting: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask While trying to make a copy of a partitioned table using the commands in the hive console: CR...
Physics asked 25/6, 2012 at 8:4

9

Solved

I have 3 data nodes running, while running a job i am getting the following given below error , java.io.IOException: File /user/ashsshar/olhcache/loaderMap9b663bd9 could only be replicated to 0 ...
Drinkable asked 22/3, 2013 at 13:29

15

Solved

I commonly work with text files of ~20 Gb size and I find myself counting the number of lines in a given file very often. The way I do it now it's just cat fname | wc -l, and it takes very long. I...
Tactile asked 3/10, 2012 at 20:42

3

Solved

Is there something sys.minint in python similar to sys.maxint ?
Undersecretary asked 21/5, 2018 at 10:0

5

Solved

Recently I'm trying OpenCV out for my graduation project. I've had some success under Windows enviroment. And because with Windows package of OpenCV it comes with pre-built libraries, so I don't ha...
Apulia asked 30/6, 2013 at 2:26

4

Solved

We are migrating from Redshift to Spark. I have a table in Redshift that I need to export to S3. From S3 this will be fed to Apache Spark (EMR). I found there is only one way to export data from ...

8

Solved

My map tasks need some configuration data, which I would like to distribute via the Distributed Cache. The Hadoop MapReduce Tutorial shows the usage of the DistributedCache class, roughly as follo...
Wimmer asked 20/1, 2014 at 16:53

3

Solved

i have a problem with Hadoop mapreduce in R, and in the logs i did find this : log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server). log4j:WARN Please initialize the lo...
Starstudded asked 29/6, 2015 at 7:34

2

Solved

I want to print each step of my "map" after its execution on the console. Something like System.out.println("Completed Step one"); System.out.println("Completed Step two"); and so on Is there ...
Signatory asked 4/8, 2011 at 13:53

4

I'm using pyMongo 1.11 and MongoDB 1.8.2. I'm trying to do a fairly complex Map/Reduce. I prototyped the functions in Mongo and got it working, but when I tried transferring it to Python, I get: -...
Unsparing asked 5/8, 2011 at 21:45

10

Solved

I am writing a Spark application and want to combine a set of Key-Value pairs (K, V1), (K, V2), ..., (K, Vn) into one Key-Multivalue pair (K, [V1, V2, ..., Vn]). I feel like I should be able to do ...
Egmont asked 18/11, 2014 at 19:15

2

Solved

I'm developing a simple financial app for keeping track of incomes and outcomes. For the sake of simplicity, let's suppose these are some of my documents: { description: "test1", amount: ...

4

Solved

Heres the Hadoop word count java map and reduce source code: In the map function, I've gotten to where I can output all the word that starts with the letter "c" and also the total number of times ...
Kalie asked 5/10, 2014 at 23:56

4

Solved

I ran a wordcount example using Mapreduce the first time, and it worked. Then, I stopped the cluster, started it back in a while, and followed the same procedure. Showed this error: 10P:/$ hado...
Emeric asked 4/8, 2015 at 19:2

5

Solved

I want to debug a mapreduce script, and without going into much trouble tried to put some print statements in my program. But I cant seem to find them in any of the logs.
Crawfish asked 8/7, 2010 at 19:34

5

Solved

I am trying to write a function that will produce the factorial of a provided integer and then reduce the factorial array (by multiplying each array element). For example: factor(5) >>> [1, 2, 3,...
Sewel asked 3/3, 2016 at 19:5

2

Solved

As question, How to get the taskID or mapperID(something like partitionID in Spark) in a hive UDF ?

3

Solved

Output files generated via the Spark SQL DataFrame.write() method begin with the "part" basename prefix. e.g. DataFrame sample_07 = hiveContext.table("sample_07"); sample_07.write().parquet("sampl...
Vitkun asked 19/3, 2016 at 21:46

2

Solved

I am trying to pass a variable (not property) using -D command line option in hadoop like -Dmapred.mapper.mystring=somexyz. I am able to set a conf property in Driver program and read it back in ma...
Cardiomegaly asked 8/7, 2014 at 12:39

3

Solved

I understood how MRv1 works.Now I am trying to understand MRv2.. what's the difference between Application Manager and Application Master in YARN?
Kadiyevka asked 21/6, 2015 at 17:19

2

Solved

Question to all Cassandra experts out there. I have a column family with about a million records. I would like to query these records in such a way that I should be able to perform a Not-Equal-To...
Independency asked 21/2, 2014 at 4:49

8

Solved

I'd like to find out good and robust MapReduce framework, to be utilized from Scala.
Osmo asked 7/6, 2009 at 15:14

© 2022 - 2025 — McMap. All rights reserved.