PMML Sample Files:
The files provided below are examples of predictive models exported in PMML. These sample files are not intended for
performance or vendor comparisons as they are provided solely for users to gain
a better understanding of the standard. No representation is made as to the
accuracy and applicability of these models. Also included are the
datasets used to train and validate these predictive models.
Past examples of PMML samples can be found on our examples archive page. Please note that not all models on this page are conformant. Please be careful while using them.
Last Update: 07/06/2015
PMML Version |
Model Type |
Vendor |
Application |
Dataset |
PMML
File |
4.1 |
Clustering |
KNIME |
KNIME 2.8 |
Audit |
View |
4.1 |
Clustering |
KNIME |
KNIME 2.8 |
Iris |
View |
4.1 |
Neural Network |
KNIME |
KNIME 2.8 |
Audit |
View |
4.1 |
NeuralNetwork |
KNIME |
KNIME 2.8 |
Iris |
View |
4.1 |
Regression |
KNIME |
KNIME 2.8 |
Audit |
View |
4.0 |
Regression |
KNIME |
KNIME 2.6.2 |
Elnino |
View |
4.0 |
Regression |
KNIME |
KNIME 2.6.2 |
Elnino |
View |
4.1 |
Regression |
KNIME |
KNIME 2.8 |
Iris |
View |
4.1 |
Tree |
KNIME |
KNIME 2.8 |
Audit |
View |
4.1 |
Tree |
KNIME |
KNIME 2.8 |
Iris |
View |
4.1 |
Support Vector Machine |
KNIME |
KNIME 2.8 |
Audit |
View |
4.1 |
Support Vector Machine |
KNIME |
KNIME 2.8 |
Iris |
View |
4.1 |
Model Ensemble - Clustering |
KNIME |
KNIME 2.8 |
Audit |
View |
4.1 |
Model Ensemble - Neural Network |
KNIME |
KNIME 2.8 |
Audit |
View |
4.1 |
Model Ensemble - Neural Network |
KNIME |
KNIME 2.8 |
Iris |
View |
4.1 |
Model Ensemble - Regression |
KNIME |
KNIME 2.8 |
Audit |
View |
4.1 |
Model Ensemble - Regression |
KNIME |
KNIME 2.8 |
Iris |
View |
4.1 |
Model Ensemble - Tree |
KNIME |
KNIME 2.8 |
Audit |
View |
4.1 |
Model Ensemble - Tree |
KNIME |
KNIME 2.8 |
Iris |
View |
4.1 |
Model Ensemble - SVM |
KNIME |
KNIME 2.8 |
Audit |
View |
4.1 |
Model Ensemble - SVM |
KNIME |
KNIME 2.8 |
Iris |
View |
3.2 |
Clustering |
R/Rattle |
PMML Package 1.2.29 |
Audit |
View |
3.2 |
Clustering |
R/Rattle |
PMML Package 1.2.29 |
Iris |
View |
3.2 |
Clustering |
R/Rattle |
PMML Package 1.2.29 |
Iris |
View |
3.2 |
Tree |
R/Rattle |
PMML Package 1.2.29 |
Audit |
View |
3.2 |
Tree |
R/Rattle |
PMML Package 1.2.29 |
Iris |
View |
3.2 |
Regression |
R/Rattle |
PMML Package 1.2.29 |
Audit |
View |
3.2 |
Regression |
R/Rattle |
PMML Package 1.2.29 |
Iris |
View |
3.2 |
Regression |
R/Rattle |
PMML Package 1.2.29 |
Iris |
View |
4.0 |
Support Vector Machine |
R/Rattle |
PMML Package 1.2.30 |
Audit |
View |
4.0 |
Random Forest |
R/Rattle |
PMML Package 1.2.30 |
Audit |
View |
4.0 |
Random Forest |
R/Rattle |
PMML Package 1.2.30 |
Iris |
View |
4.0 |
General Regression |
R/Rattle |
PMML Package 1.2.30 |
Iris |
View |
4.0 |
Association Rules |
R/Rattle |
PMML Package 1.2.30 |
Shopping |
View |
4.1 |
Transformations |
R/Rattle |
PMML Package 1.3 |
Audit |
View |
4.1 |
Transformations |
R/Rattle |
PMML Package 1.3 |
Iris |
View |
4.2 |
Clustering |
Apache Spark |
Apache Spark MLlib 1.4 |
Iris |
View |
The Data Mining Group is always looking to increase the
variety of these samples. If you would like to submit samples,
please see the instructions below.
Datasets for PMML Sample Files
We encourage contributors to generate their PMML files based on the
datasets listed below. While a high level description is provided here, more details
can be found in the ReadMe text file associated with each dataset, when specified. If you
publish material based on these datasets, please note the source in your
acknowledgements.
Dataset |
Description |
Source |
Comma-Delimited File |
Audit |
The audit dataset is supplied as part of the R Rattle package.
It is an artificial dataset consisting of fictional clients who have been audited,
perhaps for tax refund compliance. For each case an outcome is recorded (whether the taxpayer's
claims had to be adjusted or not) and any amount of adjustment that resulted is also recorded. |
Togaware |
View |
Elnino |
Contains oceanographic and surface meteorological readings taken from a
series of buoys positioned throughout the equatorial Pacific. The "small"
dataset is provided here, larger dataset are available via the UCI KDD
Archive. The data is expected to aid in the understanding and prediction of
El Nino/Southern Oscillation (ENSO) cycles (from National Oceanic and
Atmospheric Administration, donated by Dr. Di Cook of Iowa State
University). Click here for more info |
UCI KDD Archive |
View |
Heart |
Data provided by the Cleveland Clinic Foundation on the diagnosis of heart
disease. The data file consists of 13 potential predictors and a target field
(num) identifying patients diagnosed with > 50% diameter narrowing of arteries
(value >50), otherwise (<50) is assigned. In the original file, categorical
values were represented by numeric codes, these have been replaced with
representative strings for easy use.
|
UCI Machine Learning Repository
|
View |
Iris |
Perhaps the best known database to be found in the pattern recognition
literature, R. A. Fisher's 1936 paper is a classic in the field and is
referenced frequently to this day. The data set contains 3 classes of 50
instances ach, where each class refers to a type of iris plant. One class
is linearly separable from the other 2; the latter are NOT linearly
separable from each other (from Fisher,R.A. "The use of multiple
measurements in taxonomic problems," Annual Eugenics, 7, Part II, 179-188,
1936).
Click here for more info |
UCI
Machine Learning Repository |
View |
Robustness |
This dataset is aimed at finding flaws in PMML export implementations.
In terms of data mininig, the data makes no sense at all, since the values are
randomly distributed, and in no way ment to be correlated. If you receive a
meaningful model, you most probably did something wrong.
Click here for more info |
IBM |
View Apply Data
View Train Data |
Shopping |
Contains shopping basket data for associations. |
IBM |
View |
Visits |
Describes the page visits of users who visited msnbc.com on September 28,
1999. Visits are recorded at the level of URL category and are recorded in
time order (from David Heckerman of Microsoft Corporation).
Visits_Small.csv contains about 65,000 visits, Visits_Large.csv contains
over 880,000 visits
Click here for more info |
UCI KDD Archive |
View 65KB
View 880KB |
Voting |
Includes votes for each of the U.S. House of Representatives Congressperson
on 16 key votes (from Congressional Quarterly Almanac, 98th Congress, 2nd
session 1984, Volume XL: Congressional Quarterly Inc. Washington, D.C.,
1985. Donated by Jeff Schlimmer at Carnegie-Mellon University).
Click here for more info |
UCI
Machine Learning Repository |
View |
How to Submit PMML Files:
If you wish to provide PMML files, please send the
following to info@dmg.org. In the body of your message please provide:
- Text describing the model, including (* = unless
included in the PMML Header element):
PMML Version *
Application *
Application
Version *
Submitting
Organization
Any special
characteristics
- If you do not use one of the datasets already listed
here, please provide text describing the dataset, including
Dataset Title
Source(s),
including any acknowledgements:
Any past usage
Description and other relevant Information:
Number of Records (items, occurrences, rows, etc.)
Number of Variables (fields, columns, etc.)
Variable Information, especially data types, categorical values, valid ranges,
etc.
Missing Values Descriptions
Summary Statistics & Data Distributions
Output/Scoring Information
- If you do use one of the datasets already listed here,
please provide the output of your model for inclusion with the existing
dataset.
- Contact information, and whether you want your
information included on this webpage
Also, attach the following files:
- The PMML file
- The dataset used to train and validate the model.
Also, include in the dataset the output of the model so other users can verify
their results. The first line (row) should contain the variable names
(column headers)
Acknowledgements:
The Data Mining Group thanks the UCI Repository of Machine
Learning Databases for being a valuable resource:
Blake, C.L. & Merz, C.J. (1998). UCI Repository of
machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html].
Irvine, CA: University of California, Department of Information and Computer
Science.
|