Just how much Java does one need to use Hadoop and Mahout effectively?
Asked Answered
N

4

5

I'm a PHP developer. Let's just get that out of the way now. But Hadoop – and Mahout in particular – have piqued my interest. I'm ready to take the dive into Java in order to use them.

So from people experience enough to know, just how much Java will I need to be able to use these effectively? From what I've seen, programming mappers/reducers doesn't take all that much. But with Mahout I'm not at all sure what I'm looking at when I look at the documentation.

Also, just how hard will it be to take data from my PHP application for processing in Java via Hadoop and Mahout? I can't imagine it'd be that difficult, but I'm not experienced enough to say.

Neoma answered 22/7, 2010 at 18:21 Comment(0)
S
7

It shouldn't be all that difficult to get data from PHP to Java for analysis using Mahout and Hadoop.

Even easier is to process using Mahout and Hadoop off-line in a batch mode and to store the data products in a file system or database. PHP can then read these data products as easy as falling off a log.

For real-time use, the recommendations part of Mahout supports a variety of web-service interfaces that make it pretty easy to access from PHP. Hitting the model evaluation part of Mahout would require a bit more programming.

Scott answered 22/7, 2010 at 19:51 Comment(4)
Ted, do you mind pointing me to the point in the documentation where these web-service interfaces are mentioned? I'm not sure I've come across this so far myself. In the meantime, thanks for your answer!Neoma
Nevermind. I think I found it under the Taste documentation. For a noob like myself, though, would you mind expanding slightly on how PHP might be integrated to work with Mahout in a real-time application? I'd deeply appreciate it.Neoma
Sorry to be slow answering... but PHP is easy to integrate via web-services calls from PHP to Mahout's Taste components. Another alternative would be to use Quercus to run PHP from a Java environment and call Apache Mahout components directly.Scott
So cwiki.apache.org/MAHOUT/recommender-documentation.html is enough to get a (PHP-friendly) Web service up and running, with nothing much more than a text file of ratings to configure. It barely scratches the surface of what's possible but is a good way to get started.Telemann
B
1

Beginner level of Java is sufficient. You can always dug deep on adhoc need basis.

Buonarroti answered 18/8, 2010 at 21:26 Comment(0)
R
1

I just did the same thing, and it's been years I did anything Java related. What I did was the following:

  1. Started off with simple Hadoop streaming examples
  2. Try my own analysis with PHP streaming
  3. Started experimenting with Pig
  4. Start experimenting with using PHP streaming inside Pig

All without any Java!

Roulette answered 8/12, 2010 at 18:26 Comment(0)
D
0

For real-time recommendations you could also instantiate an instance of mahout in a java servlet class, then serve export that as a war to serve up on a tomcat server.

Derwon answered 19/8, 2011 at 23:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.