I'm starting to learn some stuff about big data with a big focus on predictive analysis and for that I have a case study I would like to implement:
I have a dataset of servers health information that is polled every 5sec. I want to show the data that is retrieved but more importantly: I want to run a machine learning model previously built and show the results (alert about servers going to crash).
The machine learning model will be built by a machine learning specialist so that's completely out of scope. My job would be to integrate the machine learning model in a platform that runs the model and shows the results in a nice dashboard.
My problem is the "big picture" architecture of this system: I see that all the pieces already exist (cloudera+mahout) but I'm missing a simple integrated solution for all my needs and I don't believe the state of art is doing some custom software...
So, can anyone shed some light on production systems like this (showing data with predictive analysis)? Reference architecture for this? Tutorials/documentation?
Notes:
I've investigated some related technologies: cloudera/hadoop, pentaho, mahout and weka. I know that Pentaho for example is able to store big data and run ad-hoc Weka analysis on that data. Using cloudera and Impala a data specialist can also run ad-hoc queries and analyse the data but that's not my goal. I want my system to run the ML model and show the results in a nice dashboard alongside the retrieved data. And I'm looking for a platform that already allows this usage instead of custom building.
I'm focusing on Pentaho as it seems to have a nice integration of Machine Learning but every tutorial I read was more about "ad-hoc" ML analysis than real-time. Any tutorial on that subject will be welcomed.
I don't mind opensource or commercial solutions (with a trial)
Depending of the specifics maybe this isn't big data: more "traditional" solutions are also welcomed.
Also real time here is a broad term: if the ML model has good performance running it every 5sec is good enough.
ML model is static (isn't real-time updating or changing its behavior)
I'm not looking for a customized application for my example as my focus is on the big picture: big data with predictive analysis generic platforms.