Is it worth purchasing Mahout in Action to get up to speed with Mahout, or are there other better sources?
Asked Answered
R

6

12

I'm currently a very casual user of Apache Mahout, and I'm considering purchasing the book Mahout in Action. Unfortunately, I'm having a really hard time getting an idea of how worth it this book is -- and seeing as it's a Manning Early Access Program book (and therefore only currently available as a beta-version e-book), I can't take a look myself in a bookstore.

Can anyone recommend this as a good (or less good) guide to getting up to speed with Mahout, and/or other sources that can supplement the Mahout website?

Rightward answered 22/12, 2010 at 15:3 Comment(0)
V
21

Speaking as a Mahout committer and co-author of the book, I think it is worth it. ;-)

But seriously, what are you working on? Maybe we can point you to some resources.

Some aspects of Mahout are just plain hard to figure out on your own. We work hard at answering questions on the mailing list, but it can really help to have sample code and a roadmap. Without some of that, it is hard to even ask a good question.

Volitive answered 22/12, 2010 at 23:38 Comment(2)
Ok, thanks for your impartial feedback ;-) I'm currently working on doing some clustering in large quantities of data (largely with Mahout), and I straight-up enjoy reading computer books as well. The reason for my question here is because I can't seem to find any feedback about the book, or more than the first chapter as an online sample, and checking out those things are an essential part of my pre-purchase ritual.Rightward
FWIW, I just bought the e-book last night (talk about taking my time to think it over ;-) and after reading the first two chapters so far, I'm really really pleased with it!Rightward
E
11

Also a co-author here. Being "from the horse's mouth" it's probably by far the most complete write-up out there for Mahout itself. There are some good blog posts out there, and certainly plenty of good books on more generally machine learning (I like Collective Intelligence in Action as a broad light intro). [email protected] has a few people that say they like the book FWIW, as do the book forums (http://www.manning-sandbox.com/forum.jspa?forumID=623) I think you can return the e-book if it's not quite what you wanted. It definitely has 6 chapters on clustering.

Edelmiraedelson answered 23/12, 2010 at 22:49 Comment(0)
P
3

there are many parts of the book that are out of date, a version or two behind what is current. In addition, there are several mistakes within the text, particularly within the examples. this may make things a bit tricky when trying to replicate the discussed results.

Additionally, you should be aware that the most mature part of mahout, the recommender system, taste, isnt distributed. I'm not really sure why this is packaged with the rest of mahout. this is more a complaint about the software package than mahout itself.

Pilau answered 27/8, 2011 at 22:50 Comment(1)
Book is updated to match version 0.5, although fixed version isn't available yetMorbilli
C
3

Currently the best out there. Probably as mature as the product. Some aspects are better than others, insight into the underlying implementation is good, practical methods to get up and running on Linux, mac osx, etc for beginners not so much. Defining a clear strategy about how to keep a recommender updated is iffy. Production examples pretty thin. Good as a starting point but you need a lot more. Authors make best attempt to help, but is a pretty new product. All in all, yes, buy it.

Clarissa answered 25/3, 2013 at 4:1 Comment(0)
D
2

I got the book a few weeks ago. Highly recommended. The authors are very active on the mailing list, too, and there is a lot of cool energy in this project.

Dalury answered 27/8, 2012 at 0:23 Comment(0)
C
1

You might also consider reading through Paco Nathan's Enterprise Data Workflows in Cascading. You can run PMML on your cluster exported from R or SAS. That is not to say anything bad about Mahout in Action, the authors did a great job and clearly put good time and effort into making it instructive and interesting. This is more of a suggestion to look beyond Mahout. It's not currently getting the kind of traction it would if it were more user friendly.

As it stands, the Mahout user experience is kinda choppy, and doesn't really give you a clear idea of how to develop and update intelligent systems and their life cycles, IMO. Mahout is not really acceptable for academics either, they are more likely to use Matlab or R. In the Mahout docs, the random forest implementation barely works and the docs have erroneous examples, etc... Thats frustrating, and the parallelism and scalability of the Mahout routines depend on the algorithm. I don't currently see Mahout going anywhere solid as it stands, again IMO. I hope I'm wrong!

http://shop.oreilly.com/product/0636920028536.do

Clarissa answered 22/8, 2013 at 23:55 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.