PMML Examples

PMML Sample Models (ARCHIVE):

Please Note: Not all models on this archive page are conformant. Please be careful while using them.

The PMMLs provided below are examples of predicted models developed that use the PMML standard.  These samples are not intended for performance or vendor comparisons as they are provided solely for users to gain a better understanding of PMML.  No representation is made as to the accuracy and applicability of these models.   Also included are the datasets used to train and validate these predictive models. 

For a full list of our most current examples, please visit our current examples page

PMML
Version
Model Type Vendor Application Dataset PMML File
2.0 Association Oracle Oracle 9i Data Mining, 9.2.0 Iris View
2.0 center-based Clustering IBM DB2 Intelligent Miner for Data V8.1 Iris View
2.0 distribution-based Clustering IBM DB2 Intelligent Miner for Data V8.1 Iris View
2.0 Naïve Bayes Oracle Oracle 9i Data Mining, 9.2.0 Iris View
2.0 Neural Network (Classification) IBM DB2 Intelligent Miner for Data V8.1 Iris View
2.0 Neural Network (Regression) IBM DB2 Intelligent Miner for Data V8.1 Iris View
2.0 Regression IBM DB2 Intelligent Miner for Data V8.1 Iris View
2.0 Tree IBM DB2 Intelligent Miner for Data V8.1 Iris View
2.1 Association IBM DB2 Intelligent Miner Modeling V8.2 Voting View
2.1 Clustering IBM DB2 Intelligent Miner Modeling V8.2 Robustness View
2.1 Tree IBM DB2 Intelligent Miner Modeling V8.2 Robustness View
3.0 Association IBM DB2 Data Warehouse Edition V9.1 Shopping View
3.0 Association SPSS Clementine, 10.0 Shopping View
3.0 Distribution-based Clustering IBM DB2 Data Warehouse Edition V9.1 Elnino View
3.0 Center-based Clustering IBM DB2 Data Warehouse Edition V9.1 Elnino View
3.0 Clustering SPSS Clementine, 10.0 Iris View
3.0 Model Composition IBM DB2 Data Warehouse Edition V9.1 Elnino View
3.0 Neural Network SPSS Clementine, 10.0 Iris View
3.0 Neural Network SPSS Clementine, 10.0 Heart View
3.0 Neural Network SPSS Clementine, 10.0 Iris View
3.0 Neural Network SPSS Clementine, 10.0 Heart View
3.0 General Regression SPSS Clementine, 10.0 Iris View
3.0 Regression IBM DB2 Data Warehouse Edition V9.1 Elnino View
3.0 Regression IBM DB2 Data Warehouse Edition V9.1 Elnino View
3.0 Regression SPSS Clementine, 10.0 Elnino View
3.0 Regression SPSS Clementine, 10.0 Elnino View
3.0 Regression SPSS Clementine, 10.0 Heart View
3.0 Ruleset SPSS Clementine, 10.0 Heart View
3.0 Sequence SPSS Clementine, 10.0 Visits View
3.0 Tree IBM DB2 Data Warehouse Edition V9.1 Heart View
3.0 Tree SPSS Clementine, 10.0 Iris View
3.0 Tree SPSS Clementine, 10.0 Heart View
3.1 Sequence IBM DB2 Data Warehouse Edition V9.1 Visits View
3.1 Association SAS SAS 9.2 Unknown View
3.1 Ann SAS SAS 9.2 Iris View
3.1 Cluster SAS SAS 9.2 Iris View
3.1 Logistic Reg. SAS SAS 9.2 Iris View
3.1 Tree SAS SAS 9.2 Iris View
4.0 Cluster KNIME KNIME 2.4 Iris View
4.0 Regression KNIME KNIME 2.4 Iris View
4.0 Neural Network KNIME KNIME 2.4 Iris View
4.0 Support Vector Machine KNIME KNIME 2.4 Iris View
4.0 Regression KNIME KNIME 2.4 Elnino View
4.0 Regression KNIME KNIME 2.4 Elnino View

The Data Mining Group is always looking to increase the variety of these samples.  If you would like to submit samples, please see the instructions on our current examples page.

Datasets for PMML Sample Models

These datasets are used in conjunction with the sample PMML models.  While a high level description is provided here, more details can be found in the ReadMe text file associated with each dataset.  If you publish material based on these datasets, please note the source in your acknowledgements.

Dataset Description Source Comma-Delimited File
Elnino Contains oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific. The "small" dataset is provided here, larger dataset are available via the UCI KDD Archive.  The data is expected to aid in the understanding and prediction of El Nino/Southern Oscillation (ENSO) cycles (from National Oceanic and Atmospheric Administration, donated by Dr. Di Cook of Iowa State University).  Click here for more info... UCI KDD Archive View
Heart Data provided by the Cleveland Clinic Foundation on the diagnosis of heart disease. The data file consists of 13 potential predictors and a target field (num) identifying patients diagnosed with > 50% diameter narrowing of arteries (value >50), otherwise (<50) is assigned. In the original file, categorical values were represented by numeric codes, these have been replaced with representative strings for easy use. UCI Machine Learning Repository View
Iris Perhaps the best known database to be found in the pattern recognition literature,  R. A. Fisher's 1936 paper is a classic in the field and is referenced frequently to this day.  The data set contains 3 classes of 50 instances ach, where each class refers to a type of iris plant.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other (from Fisher,R.A. "The use of multiple measurements in taxonomic problems," Annual Eugenics, 7, Part II, 179-188, 1936). 
Click here for more info...
UCI Machine Learning Repository View
Robustness This dataset is aimed at finding flaws in PMML export implementations. In terms of data mininig, the data makes no sense at all, since the values are randomly distributed, and in no way ment to be correlated. If you receive a meaningful model, you most probably did something wrong. Click here for more info IBM View Apply Data
View Train Data
Shopping Contains data for SPSS SHOPPING_ASSOC.xml SPSS View
Visits Describes the page visits of users who visited msnbc.com on September 28, 1999. Visits are recorded at the level of URL category and are recorded in time order (from David Heckerman of Microsoft Corporation).  Visits_Small.csv contains about 65,000 visits, Visits_Large.csv contains over 880,000 visits
Click here for more info…
UCI KDD Archive View 65KB

View 880KB

Voting Includes votes for each of the U.S. House of Representatives Congressperson on 16 key votes (from Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional Quarterly Inc. Washington, D.C., 1985. Donated by Jeff Schlimmer at Carnegie-Mellon University).
Click here for more info...
UCI Machine Learning Repository View

Additional PMML Examples

These models are additional examples of PMML, not based on the datasets listed above (datasets marked * can be found by seaching the UCI Machine Learning Repository, datasets marked N/A are not available).  These models are included here to provide a wider range of PMML examples for inspection and understanding.

PMML

Version

Model Type Vendor Application Dataset PMML File
3.0 Regression

Salford

Systems

MARS N/A View
2.0 Tree Weka Weka 3-3-5 Anneal* View
2.0 Tree Weka Weka 3-3-5 Audiology* View
2.0 Tree Weka Weka 3-3-5 Autos* View
2.0 Tree Weka Weka 3-3-5 Balance Scale* View
2.0 Tree Weka Weka 3-3-5 Breast Cancer* View
2.0 Tree Weka Weka 3-3-5 Wisconsin Breast Cancer* View

Acknowledgements:

The Data Mining Group thanks the UCI Repository of Machine Learning Databases for being a valuable resource:

Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.

e-mail info at dmg.org