PMML Examples

PMML Sample Files:

The files provided below are examples of predictive models exported in PMML.  These sample files are not intended for performance or vendor comparisons as they are provided solely for users to gain a better understanding of the standard.  No representation is made as to the accuracy and applicability of these models.   Also included are the datasets used to train and validate these predictive models. 

Past examples of PMML samples can be found on our examples archive page. Please note that not all models on this page are conformant. Please be careful while using them.

Last Update: 07/06/2015

PMML
Version
Model Type Vendor Application Dataset PMML File
4.1 Clustering KNIME KNIME 2.8 Audit View
4.1 Clustering KNIME KNIME 2.8 Iris View
4.1 Neural Network KNIME KNIME 2.8 Audit View
4.1 NeuralNetwork KNIME KNIME 2.8 Iris View
4.1 Regression KNIME KNIME 2.8 Audit View
4.0 Regression KNIME KNIME 2.6.2 Elnino View
4.0 Regression KNIME KNIME 2.6.2 Elnino View
4.1 Regression KNIME KNIME 2.8 Iris View
4.1 Tree KNIME KNIME 2.8 Audit View
4.1 Tree KNIME KNIME 2.8 Iris View
4.1 Support Vector Machine KNIME KNIME 2.8 Audit View
4.1 Support Vector Machine KNIME KNIME 2.8 Iris View
4.1 Model Ensemble - Clustering KNIME KNIME 2.8 Audit View
4.1 Model Ensemble - Neural Network KNIME KNIME 2.8 Audit View
4.1 Model Ensemble - Neural Network KNIME KNIME 2.8 Iris View
4.1 Model Ensemble - Regression KNIME KNIME 2.8 Audit View
4.1 Model Ensemble - Regression KNIME KNIME 2.8 Iris View
4.1 Model Ensemble - Tree KNIME KNIME 2.8 Audit View
4.1 Model Ensemble - Tree KNIME KNIME 2.8 Iris View
4.1 Model Ensemble - SVM KNIME KNIME 2.8 Audit View
4.1 Model Ensemble - SVM KNIME KNIME 2.8 Iris View
3.2 Clustering R/Rattle PMML Package 1.2.29 Audit View
3.2 Clustering R/Rattle PMML Package 1.2.29 Iris View
3.2 Clustering R/Rattle PMML Package 1.2.29 Iris View
3.2 Tree R/Rattle PMML Package 1.2.29 Audit View
3.2 Tree R/Rattle PMML Package 1.2.29 Iris View
3.2 Regression R/Rattle PMML Package 1.2.29 Audit View
3.2 Regression R/Rattle PMML Package 1.2.29 Iris View
3.2 Regression R/Rattle PMML Package 1.2.29 Iris View
4.0 Support Vector Machine R/Rattle PMML Package 1.2.30 Audit View
4.0 Random Forest R/Rattle PMML Package 1.2.30 Audit View
4.0 Random Forest R/Rattle PMML Package 1.2.30 Iris View
4.0 General Regression R/Rattle PMML Package 1.2.30 Iris View
4.0 Association Rules R/Rattle PMML Package 1.2.30 Shopping View
4.1 Transformations R/Rattle PMML Package 1.3 Audit View
4.1 Transformations R/Rattle PMML Package 1.3 Iris View
4.2 Clustering Apache Spark Apache Spark MLlib 1.4 Iris View

The Data Mining Group is always looking to increase the variety of these samples.  If you would like to submit samples, please see the instructions below.

Datasets for PMML Sample Files

We encourage contributors to generate their PMML files based on the datasets listed below. While a high level description is provided here, more details can be found in the ReadMe text file associated with each dataset, when specified.  If you publish material based on these datasets, please note the source in your acknowledgements.

Dataset Description Source Comma-Delimited File
Audit The audit dataset is supplied as part of the R Rattle package. It is an artificial dataset consisting of fictional clients who have been audited, perhaps for tax refund compliance. For each case an outcome is recorded (whether the taxpayer's claims had to be adjusted or not) and any amount of adjustment that resulted is also recorded.  Togaware View
Elnino Contains oceanographic and surface meteorological readings taken from a series of buoys positioned throughout the equatorial Pacific. The "small" dataset is provided here, larger dataset are available via the UCI KDD Archive.  The data is expected to aid in the understanding and prediction of El Nino/Southern Oscillation (ENSO) cycles (from National Oceanic and Atmospheric Administration, donated by Dr. Di Cook of Iowa State University).  Click here for more info  UCI KDD Archive View
Heart Data provided by the Cleveland Clinic Foundation on the diagnosis of heart disease. The data file consists of 13 potential predictors and a target field (num) identifying patients diagnosed with > 50% diameter narrowing of arteries (value >50), otherwise (<50) is assigned. In the original file, categorical values were represented by numeric codes, these have been replaced with representative strings for easy use.  UCI Machine Learning Repository View
Iris Perhaps the best known database to be found in the pattern recognition literature,  R. A. Fisher's 1936 paper is a classic in the field and is referenced frequently to this day.  The data set contains 3 classes of 50 instances ach, where each class refers to a type of iris plant.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other (from Fisher,R.A. "The use of multiple measurements in taxonomic problems," Annual Eugenics, 7, Part II, 179-188, 1936). 
Click here for more info 
UCI Machine Learning Repository View
Robustness This dataset is aimed at finding flaws in PMML export implementations. In terms of data mininig, the data makes no sense at all, since the values are randomly distributed, and in no way ment to be correlated. If you receive a meaningful model, you most probably did something wrong. Click here for more info  IBM View Apply Data
View Train Data
Shopping Contains shopping basket data for associations.  IBM View
Visits Describes the page visits of users who visited msnbc.com on September 28, 1999. Visits are recorded at the level of URL category and are recorded in time order (from David Heckerman of Microsoft Corporation).  Visits_Small.csv contains about 65,000 visits, Visits_Large.csv contains over 880,000 visits
Click here for more info 
UCI KDD Archive View 65KB

View 880KB

Voting Includes votes for each of the U.S. House of Representatives Congressperson on 16 key votes (from Congressional Quarterly Almanac, 98th Congress, 2nd session 1984, Volume XL: Congressional Quarterly Inc. Washington, D.C., 1985. Donated by Jeff Schlimmer at Carnegie-Mellon University).
Click here for more info 
UCI Machine Learning Repository View

How to Submit PMML Files:

If you wish to provide PMML files, please send the following to info@dmg.org.  In the body of your message please provide:

  • Text describing the model, including (* = unless included in the PMML Header element):

    PMML Version *

    Application *

    Application Version *

    Submitting Organization

    Any special characteristics

  • If you do not use one of the datasets already listed here, please provide text describing the dataset, including

    Dataset Title

    Source(s), including any acknowledgements:
    Any past usage
    Description and other relevant Information:
    Number of Records (items, occurrences, rows, etc.)
    Number of Variables (fields, columns, etc.)
    Variable Information, especially data types, categorical values, valid ranges, etc.
    Missing Values Descriptions
    Summary Statistics & Data Distributions

    Output/Scoring Information

  • If you do use one of the datasets already listed here, please provide the output of your model for inclusion with the existing dataset.
  • Contact information, and whether you want your information included on this webpage

Also, attach the following files:

  • The PMML file
  • The dataset used to train and validate the model.  Also, include in the dataset the output of the model so other users can verify their results.  The first line (row) should contain the variable names (column headers)

Acknowledgements:

The Data Mining Group thanks the UCI Repository of Machine Learning Databases for being a valuable resource:

Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.

e-mail info at dmg.org