I'd like to write a "smart monitor" in Java that sends out an alert any time it detects oncoming performance issues. My Java app is writing data in a structured format to a log file:
<datetime> | <java-method> | <seconds-to-execute>
So, for example, if I had a Widget#doSomething(String)
method that took 812ms to execute, it would be logged as:
2013-03-24 11:39:21 | Widget#doSomething(String) | 812
As performance starts to degrade (such as during a major collection, during peak loads, or if the system is just slowing to a crawl), method execution timings start to slow down; so the right-most column starts to see huge numbers (sometime 20 - 40 seconds to execute a single method).
In college - for a machine learning exercise - I wrote what my professor called a linear dichotomizer that took simple test data (the height, weight and gender of a person) and "learned" how to categorize a person as male or female based on their height/weight. Then, once it had all its training data, we fed it new data to see how accurately it could determine gender.
I think the multivariate version of a linear dichotomizer is something called a support vector machine (SVM). If I'm wrong, then please clarify and I'll change the title of my question to something more appropriate. Regardless, I need this app to do the following things:
- Run in a "test mode" where I feed it the structured log file from my main Java app (the one I wish to monitor) and it takes each log entry (as shown above) and uses it for test data
- Only the
java-method
andseconds-to-execute
columns are important as inputs/test data; I don't care about the datetime - Run in "monitor mode" where it is actively reading new log data from the log file, and using similar "machine learning" techniques to determine if a a performance degradation is looming
It's important to note that the seconds-to-execute
column is not the only important factor here, as I've seen horrible timings for certain methods during periods of awesome performance, and really great timings for other methods at times when the server seemed like it was about to die and push daisies. So obviously certain methods are "weighted"/more important to performance than others.
My question
- Googling for "linear dichotomizer" or "support vector machines" turns up some really scary, highly-academic, ultra-cerebral white papers that I just don't have the mental energy (nor time) to consume - unless they truly are my only options; so I ask is there a laymen's introduction to this stuff, or a great site/article/tutorial for building such a system in Java?
- Are there any solid/stable open source Java libraries? I was only able to find
jlibsvm
andsvmlearn
but the former looks to be in a pure beta state and the latter seems to only support binary decisions (like my old linear dichotomizer). I know there's Mahout but that sits on top of Hadoop, and I don't think I have enough data to warrant the time and mental energy into setting up my own Hadoop cluster.