Java based Mahout's goal is to build scalable machine learning libraries. Are there any equivalent libraries in Python ?
Java's Mahout equivalent in Python
Asked Answered
You could use Jython or JPype to intergrate Mahout with your Python code. See my simular question: #7492453 –
Slayton
Python is not considered a good choice for large dataset computations since the performance gets prohibitively slow. –
Dubuffet
scikits learn is highly recommended http://scikit-learn.sourceforge.net/
Just a note: the current implementation of scikit-learn its not yet able to leverage a Hadoop cluster to do distributed computing. It is however fairly scalable to address medium sized problems (e.g. hundreds of thousands of samples and features for linear models), esp. if you use sparse representations and / or memmap'ed arrays. –
Osy
Spark MLlib is recommmended. It is a scalable machine learning lib, can read data from HDFS and of course runs on top of Spark.
You can access it via PySpark (see the Programming Guide's Python examples).
Orange is supposedly pretty decent, from what I've heard, but I've never used it personally. PyML might be worth taking a look at as well. Also, Monte.
Orange isn't even close to being scalable. Nearly all of its algorithms are slow batch processes, and they have no intention of making them otherwise due to the academic orientation of the project. Sadly, there really isn't any Python equivalent of Mahout. –
Traction
@Chris: the scikit-learn is probably not there yet, but it has the goal to be scalable and avoid the pitfalls of academic-oriented projects. Some of our implementations for standard problems scale already quite well. –
Woothen
© 2022 - 2024 — McMap. All rights reserved.