I'm working on a large text classification project and we have our text data (simple messages) stored in HBase.
We have two problems, first we would like to use HBase as the source for Mahout classifiers namely Bayers and Random Forests.
Second, we would like to be able to store the model generated in HBase instead of using the in memory approach (InMemoryBayesDatastore) however as our sets grow we are running into problems with memory utilization and would like to test out HBase as a viable alternative.
There seems to be little material floating around using HBase with Mahout and if it's possible to use it as a potential datasource. I'm using Mahout 0.6 core API in Java which has the InMemory datastore.
Doing a bit of digging I belive that there (was) a HBase Bayers Datastore component - org.apache.mahout.classifier.bayes.datastore.HBaseBayesDatastore
See older JavaDoc here: http://www.jarvana.com/jarvana/view/org/apache/mahout/mahout-core/0.3/mahout-core-0.3-javadoc.jar!/org/apache/mahout/classifier/bayes/datastore/HBaseBayesDatastore.html
However, looking at the latest documentation it looks like this feature has disappeared..? https://builds.apache.org/job/Mahout-Quality/javadoc/
I wanted to know if it was still possible to use HBase as a datastource for Bayers and RandomForests and are there any previous uses cases in this?
Thanks!