Java Support for PMML
Asked Answered
A

1

11

I am new in PMML: Predictive Model Markup Language (www.dmg.org) and I was wondering if there is some kind of Java support (Open Source / professional) for creating/parsing PMML files.

Initially I only have in mind the possibility of creating/parsing PMML files programatically from Java environments.

I have been "googling" and I have found several possibilities:

Open source:

From Java.

  • JDM. javax.datamining. Seems it a dead ? Someone has more info?

Professional.

DIY

  • Use an XML Java library and build yourself a parser/writer of PMML files

I appreciate all your opinions.

Thanks in advance

Oscar

Antrum answered 2/9, 2011 at 8:19 Comment(2)
Agreeing with nfechner here. On a higher level I would advice the use of jpmml or your own home-made tools if it's an exploration of JPMML in Java. If you (or your employer) plan to make use of this in some IT solution then a commercial library could be a better idea.Welfarism
Thanks for your messages!. nfechner, I just wanted an informal 'poll' (+ opinions) to figure out the possibilities of PMML parsing in a Java environment. That is, write-read pmml content programmatically using existing libraries, with the objective to not "reinventing the wheel" for this issue. Basically, I'll follow Wivani advice by the moment (jpmml + some DIY library).Antrum
M
1

You should realize that the answer may depend on the MODEL-ELEMENT that you want to work with. It is also very likely that your best options for creating PMML and parsing PMML will come from different software packages. I am going to assume that by 'creation of PMML' you mean of the document and not of the model. I've never heard of anyone integrating automatic model fitting with execution but perhaps it exists already. Certainly a PMML model could be passed using SOAP.

I can't speak to the other projects but the product offered by Zementis, called Adapa, is used only for the execution of PMML. This product assumes that there is a model fitting application that will do the creating by exporting a fitted model into PMML. There are already a lot of well developed model fitting applications so I think this is a reasonable assumption.

The version I have used (3.6) was generally fast but it couldn't handle ensembles of typical random forest size (500+ trees) without an especially large heap. I think they may have fixed this in newer versions. Though it isn't advertised, Zementis doesn't appear to offer a few of the models, namely Text Models, Sequences, Baseline Models, or Time Series (for which the PMML standard currently only has Exponential Smoothing anyway). My version also doesn't have K-Nearest Neighbors but I hear that more recent versions do.

Unless you are considering integrated fitting and execution (in which case you should consider online learning) my advise would be to consider these questions in order:

  1. What is the model type that I am interested in using?
  2. What application/s do I prefer to build models in?
  3. Then lastly how will I execute this and what requirements do I have in this regard (web-services, cloud, performance etc)?

If you look at the list of members to the DMG group you will find many commercial vendors that are either on the supply side (eg. SAS, SPSS, Togaware, Rapid-I) or the demand side (so many to list).

On your list you also didn't mention Weka but they also execute some PMML models and there are R/Java based solutions and so you could execute PMML->R imports (see fileToXMLNode) in a Java environment (but you could also just execute R).

Finally, if you have a very specific model in mind and you understand what it means mathematically to 'execute it' then it shouldn't be too difficult to build what you need yourself.

Manumission answered 31/5, 2014 at 20:35 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.