Choice of Machine Learning Platform [closed]
Asked Answered
A

5

6

I have a data set of users and their loan repayment metrics (how long they took, how many installments etc). Now I want to analyse a user's past loan history and say, "If we loan them X they will most likely repay over Y installments, over Z days"

Here is my take

  1. The algorithm is a Clustering algorithm to group all users according to their repayment habits
  2. I want to use a SOM or K-Means

So my question is, what platforms are good for this? I have had a look at Mahout so far.

Azral answered 27/1, 2011 at 14:6 Comment(1)
It depends how much data do you need to process and how much time you can wait for results. Sometimes it is faster to get results with Knime or rapidminer (open source data mining applications with nice UI), than to find enough machines for mahout + do configuration + tuning...Winifred
G
2

Well worth taking a look at Weka - it's a reasonably mature open source toolkit with lots of machine learning algorithms, clustering included.

Gravesend answered 27/1, 2011 at 17:29 Comment(0)
V
2

RapidMiner - community edition available for free - easy to use - nice visualizations

http://rapid-i.com/content/view/181/190/

Vigilante answered 16/10, 2011 at 17:40 Comment(0)
D
0

Another good library is scikits.learn, a machine learning library for Python programmers.

Dolomites answered 8/2, 2011 at 8:34 Comment(0)
E
0

There is an amazing book on this topic - "Programming Collective Intelligence" by Toby Segaran. It discusses different machine learning algorithms, clustering, etc. Also includes links to useful libraries and sample code.

Essence answered 16/10, 2011 at 17:57 Comment(0)
T
0

Why clustering? It doesn't look like clustering problem. You can make cluster analysis as preprocessing phase to distinguish several groups of users (or you may omit this phase), but then you need to do some kind of numeric prediction: both - count of installments and days - are numbers, so how are you going to get these numbers with clustering?

I suggest you using regression for this task. Linear regression must fit your needs. If dependent variables (# of installments and days) depend on other attributes non-linearly, you can try polynomial regression or even algorithms like M5', that first build decision tree and then add regression model to each leaf of that tree.

If you have non-numeric attributes, you can also try to use classification - in this case you need to manually create possible classes (e.g. # of installments: from 3 to 5, from 6 to 10, etc.) and then use any of classification algorithms (C4.5, SVM, Naive Bayes to mention a few).

Actually, I don't think you have tons of data. I believe if is less then 50Mb overall, so there's no need to use monsters like Mahout, that are designed to process really, really big amounts of data. You can use Weka or RapidMiner for this purpose. Even if they are not able to handle your data with default config, just increase memory for JVM and in 99% of cases they will be ok.

Thyestes answered 16/10, 2011 at 18:29 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.