Output Fields
PMML3.0 Menu

Home


PMML Notice and License

Changes


Conformance

General Structure

Header

Data
Dictionary


Mining
Schema


Transformations

Statistics

Taxomony

Targets

Output

Functions

Built-in Functions

Model Composition

Model Verification


Association Rules

Cluster
Models


General
Regression


Naive
Bayes


Neural
Network


Regression

Ruleset

Sequences

Text Models

Trees

Vector Machine

PMML 3.0 - Output fields

The output fields describe a set of result values that can be computed by the model. In particular, the output fields specify names and types and rules for selecting specific result features. This information can be used while writing an output table. The Output section in the model specifies default names for columns in an output table and describes how to compute the corresponding values.

Example:

 <Output>
  <OutputField name="P_responseYes" optype="continuous" datatype="xs:double" 
               targetField="response" feature="probability" value="YES" /> 
  <OutputField name="P_responseNo" optype="continuous" datatype="xs:double" 
               targetField="response" feature="probability" value="NO" /> 
  <OutputField name="I_response" optype="categorical" datatype="xs:string" 
               targetField="response" feature="predictedValue" /> 
  <OutputField name="U_response" optype="categorical" datatype="xs:string" 
               targetField="response" feature="predictedDisplayValue" /> 
 </Output>
If a model contains this Output element a PMML consumer could map an input table to an output table with columns named "P_responseYes", "P_responseNo", etc. The values for "P_responseYes" are determined as the probability that the target field, with name "response" has the value "YES".

The Schema is


  <xs:element name="Output">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:element ref="OutputField" minOccurs="1" maxOccurs="unbounded" />
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="OutputField">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="name" type="FIELD-NAME" use="required" />
      <xs:attribute name="displayName" type="xs:string" />
      <xs:attribute name="optype" type="OPTYPE" />
      <xs:attribute name="dataType" type="DATATYPE"/>
      <xs:attribute name="targetField" type="FIELD-NAME" use="required" />
      <xs:attribute name="feature" type="RESULT-FEATURE" />
      <xs:attribute name="value" type="xs:string" />
    </xs:complexType>
  </xs:element>

  <xs:simpleType name="RESULT-FEATURE">      
     <xs:restriction base="xs:string">
        <xs:enumeration value="predictedValue" />
        <xs:enumeration value="predictedDisplayValue" />
        <xs:enumeration value="probability" />
        <xs:enumeration value="residual" />
        <xs:enumeration value="standardError" />
        <xs:enumeration value="clusterId" />
        <xs:enumeration value="clusterAffinity" />
        <xs:enumeration value="warning" />
     </xs:restriction>
  </xs:simpleType>

The value of attribute name specifies the name of a new field in the output. It can be any string. The name itself does not define how the output values are computed.

The attribute targetField refers to an input field or to a derived field. If the field is a derived field that is used as a target for a prediction, then the attributes feature and value specify whether the output value is meant to be, e.g., the predicted value for the target or a probability of a certain target value. The attribute value contains a normalized value, if applicable.

If the attribute feature is not specified then the output value is a copy of the field value.

If the attribute feature is specified then targetField must refer to a target field, and the output value is computed from the mining result.

The meaning of the feature identifiers is:

predictedValue
Select the raw predicted value, aka target value.
predictedDisplayValue
Select the display value that corresponds to the raw predicted value. The display value can be specified in the element Target. If it is not specified explicitly, then the raw predicted value is used by default.
probability
Select the probability of the target value as given by the attribute value. The target value corresponds to, e.g., values in RegressionTable.targetCategory or in ScoreDistribution.value for tree classification. That is, these value can be normalized values. The corresponding original values can be found in the Target elements. Target.value matches OutputField.value and Target.displayValue is the original value.
residual
Select the residual of the target value. For numeric prediction this is the actual value minus the predicted value. For classification this is [actual value = target value] minus the predicted probability for the target value. The attribute 'value' specifies the raw target value. The term [actual value = target value] is defined as 1.0 if the actual value is the same as the target value, and 0.0 otherwise.
standardError
Select the standard error of the predicted numeric value. In a regression model this value is computed as a square root of x�Vx where x is a vector of parameter coefficients based on the given predictors and V is the parameters covariance matrix.
clusterId
Indicates that this field is the ID of the predicted cluster.
clusterAffinity
clusterAffinity is the value of the distance or the similarity depending on the context of the clustering PMML document. Please note that a clustering PMML document producer may output the distance to the nearest center, instead of the cluster center. This specification supports only the distance to the cluster center given in clusterId, NOT the distance to the nearest center.
warning
Any warning message such as 'too many missing values'.
Note that the feature identifier 'residual' is useful only if the model is used on test data that contains target values. It is straightforward to compute the residual on numeric data as the prediction error. The residual is based on differences of probability values in the case of categorical data. For example, let's assume a classification model predicts the labels "Y" and "N". For some row in the test data the actual value may be "Y" and the predicted value is "Y" with probability of 0.8. The term [actual value = target value] maps to 1.0 and the residual is the difference between 1.0 and the probability, i.e. 1.0-0.8 = 0.2. For some other row the actual value may be "N". Assuming the predicted value and probability are the same as before we have [actual value = target value] = 0.0 and residual = 0.0 - 0.8 = -0.8.
e-mail info at dmg.org