PMML 4.3 - Output fields

Output element describes a set of result values that can be returned from a model. In particular, OutputField elements specify names, types and rules for calculating specific result features. This information can be used while writing an output table. The Output section in the model specifies names for columns in an output table and describes how to compute the corresponding values.

Example

<Output>
  <OutputField name="P_responseYes" optype="continuous" dataType="double" targetField="response" feature="probability" value="YES"/>

  <OutputField name="P_responseNo" optype="continuous" dataType="double" targetField="response" feature="probability" value="NO"/>

  <OutputField name="I_response" optype="categorical" dataType="string" targetField="response" feature="predictedValue"/>

  <OutputField name="U_response" optype="categorical" dataType="string" targetField="response" feature="predictedDisplayValue"/>
</Output>

If a model contains this Output element a PMML consumer could map an input table to an output table with columns named P_responseYes, P_responseNo, etc. The values for P_responseYes are determined as the probability that the target field, with name response has the value YES.

Schema:

<xs:element name="Output">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="OutputField" minOccurs="1" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="OutputField">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:sequence minOccurs="0" maxOccurs="1">
        <xs:element ref="Decisions" minOccurs="0" maxOccurs="1"/>
        <xs:group ref="EXPRESSION" minOccurs="1" maxOccurs="1"/>
      </xs:sequence>
    </xs:sequence>
    <xs:attribute name="name" type="FIELD-NAME" use="required"/>
    <xs:attribute name="displayName" type="xs:string"/>
    <xs:attribute name="optype" type="OPTYPE"/>
    <xs:attribute name="dataType" type="DATATYPE" use="required"/>
    <xs:attribute name="targetField" type="FIELD-NAME"/>
    <xs:attribute name="feature" type="RESULT-FEATURE" default="predictedValue"/>
    <xs:attribute name="value" type="xs:string"/>
    <xs:attribute name="ruleFeature" type="RULE-FEATURE" default="consequent"/>
    <xs:attribute name="algorithm" default="exclusiveRecommendation">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="recommendation"/>
          <xs:enumeration value="exclusiveRecommendation"/>
          <xs:enumeration value="ruleAssociation"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
    <xs:attribute name="rank" type="INT-NUMBER" default="1"/>
    <xs:attribute name="rankBasis" default="confidence">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="confidence"/>
          <xs:enumeration value="support"/>
          <xs:enumeration value="lift"/>
          <xs:enumeration value="leverage"/>
          <xs:enumeration value="affinity"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
    <xs:attribute name="rankOrder" default="descending">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="descending"/>
          <xs:enumeration value="ascending"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
    <xs:attribute name="isMultiValued" default="0"/>
    <xs:attribute name="segmentId" type="xs:string"/>
    <xs:attribute name="isFinalResult" type="xs:boolean" default="true"/>
  </xs:complexType>
</xs:element>

<xs:simpleType name="RESULT-FEATURE">
  <xs:restriction base="xs:string">
    <xs:enumeration value="predictedValue"/>  
    <xs:enumeration value="predictedDisplayValue"/>
    <xs:enumeration value="transformedValue"/> 
    <xs:enumeration value="decision"/>
    <xs:enumeration value="probability"/>
    <xs:enumeration value="affinity"/>
    <xs:enumeration value="residual"/>
    <xs:enumeration value="standardError"/>
    <xs:enumeration value="clusterId"/>
    <xs:enumeration value="clusterAffinity"/>
    <xs:enumeration value="entityId"/>
    <xs:enumeration value="entityAffinity"/>
    <xs:enumeration value="warning"/>
    <xs:enumeration value="ruleValue"/>
    <xs:enumeration value="reasonCode"/>
    <xs:enumeration value="antecedent"/>
    <xs:enumeration value="consequent"/>
    <xs:enumeration value="rule"/>
    <xs:enumeration value="ruleId"/>
    <xs:enumeration value="confidence"/>
    <xs:enumeration value="support"/>
    <xs:enumeration value="lift"/>
    <xs:enumeration value="leverage"/>
  </xs:restriction>
</xs:simpleType>

<xs:element name="Decisions">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Decision" minOccurs="1" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="businessProblem" type="xs:string"/>
    <xs:attribute name="description" type="xs:string"/>
  </xs:complexType>
</xs:element>

<xs:element name="Decision">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="value" type="xs:string" use="required"/>
    <xs:attribute name="displayValue" type="xs:string"/>
    <xs:attribute name="description" type="xs:string"/>
  </xs:complexType>
</xs:element>

<xs:simpleType name="RULE-FEATURE">
  <xs:restriction base="xs:string">
    <xs:enumeration value="antecedent"/>
    <xs:enumeration value="consequent"/>
    <xs:enumeration value="rule"/>
    <xs:enumeration value="ruleId"/>
    <xs:enumeration value="confidence"/>
    <xs:enumeration value="support"/>
    <xs:enumeration value="lift"/>
    <xs:enumeration value="leverage"/>
    <xs:enumeration value="affinity"/>
  </xs:restriction>
</xs:simpleType>

The attribute name specifies the name of a the OutputField. The name itself does not define how the output values are computed. For information on the naming of OutputFields, see Scope of Fields.

The required attribute dataType of an OutputField element specifies the data type for the output column. opType can be used to indicate admissible operations on the values. A clusterId field, for example, can have integer as its dataType, but categorical as its opType. For details, see the description of DataDictionary.

If present, the attribute targetField must refer either to a MiningField of usage type target or a field described in Targets element. targetField is a required attribute in case the model has multiple target fields.

The attribute feature specifies the value the output field takes from the computed mining result.

The attribute value is used in conjunction with result features referring to specific values. For example, when used with the feature probability, the attribute value indicates the category for which a probability is returned. When not specified, the probability of the predicted categorical value should be returned as an output.

The attribute isFinalResult is added in PMML 4.3 to indicate whether the result should be returned to the user or is only used as input to another OutputField that descrbes a transformed value. The default is true for consistency with previous PMML versions.

An output field may contain an association rule or any of its properties. ruleFeature specifies which feature of an association rule to return. This attribute has been deprecated as of PMML 4.2. The rule feature values can now be specified in the feature attribute.

The attribute algorithm specifies which scoring algorithm to use when computing the output value. It applies only to Association Rules models.

The attribute rank is used to specify the rank of the feature value from the mining result that should be selected. It can be used with models that rank the possible outcomes (e.g. classification, clustering, score cards, association rules or kNN). When not specified, the winning outcome is returned. When specified, the outcome of the particular rank is returned instead. For example, scorecards can return multiple reason codes depending on the complexity of the scorecard. If the scorecard computes multiple reason codes, rank="1" returns the top reason code. To return other reason codes, rank must be set to the appropriate index. Note that the attribute rank cannot be used together with the attribute value.

The attribute rankOrder determines the sorting order when ranking the results. The default behavior (rankOrder="descending") indicates that the result with the highest rank will appear first on the sorted list.

The attribute rankBasis applies only to Association Rules and is used to specify which criterion is used to sort the output result. For instance, the result could be sorted by the confidence, support or lift of the rules.

The attribute isMultiValued indicates that the output can represent multiple output values. This attribute has been deprecated as of PMML 4.2. If the value of the attribute is "1", then the rank value indicates the number of output values that should be returned - a positive value indicates the number of output values to be returned, based on the rankBasis and rankOrder, while a zero value indicates that all output values are to be returned. If the value of isMultiValued is "0" (default), then rank indicates a particular output, as defined above.

The attribute segmentId is applicable to MiningModels which utilize Segmentation. This attribute provides an approach to deliver results from Segments which avoids having to specify Outputs within each Segment. If the segmentId attribute matches the id attribute of a Segment within the scope of this model element (the model element containing this OutputField's parent Outputs element), and if the predicate of that segment is true, then this OutputField returns the specified result feature from the sub-model contained within the specified segment. If there is no Segment matching segmentId or if the predicate of the matching Segment evaluated to false, then by convention the result delivered by this OutputField is missing. The features decision or transformedValue may be used along with a segmentId, to refer to an output value of the particular segment. In this case, the attribute value must also be specified and it must refer to the name of the output field of the segment for the decision or transformed value of interest.

Result Features

The meaning of the feature identifier is:

predictedValue: Select the raw predicted value. More than one OutputField element can have the predictedValue feature only if the model predicts more than one field. Details can be found in the description of the individual models.
predictedDisplayValue: Select the display value that corresponds to the raw predicted value. For most models, the display value can be specified in the element Target. If it is not specified explicitly, then the raw predicted value is used by default.
transformedValue: Apply some transformation to the value of an output field. For this result feature, OutputField must contain an EXPRESSION, unless it is used to refer to a transformed value of a segment model through the segmentID attribute. The transformed value is the result of the evaluation of the EXPRESSION. Often, the transformed value represents a shift of the value into a different scale. Scaling is commonly used to transform a value into a value that is more friendly to humans, for example, transforming values 0 to 1 into 0 to 100. Another common use case is the re-transformation of a value that has been scaled by the algorithm before the calculations in order to present the output in the original scale rather than the transformed one. A typical example is a logarithmic transformation, where, instead of a target value x, its logarithm log(x) is used. When presenting predicted values v, they will have to be re-transformed as exp(v).
decision: Derive a decision from the output of a data mining model. For this result feature, OutputField must contain an EXPRESSION, unless it is used to refer to a decision of segment model through the segmentID attribute. The decision is the result of the evaluation of the EXPRESSION. If the result feature is decision all possible values resulting from the evaluation of EXPRESSION must be introduced using Decision elements.
probability: Select the probability of the target value as given by the attribute value. The value corresponds to one of the target categories of the model.
residual: Select the residual of the target value. For numeric prediction this is the actual value minus the predicted value. For classification this is [actual value = target value] minus the predicted probability for the target value. The attribute value specifies the raw target value. The term [actual value = target value] is defined as 1.0 if the actual value is the same as the target value, and 0.0 otherwise. Note in order for residual values to be computed the input data must include target values.
standardError: Select the standard error of the predicted numeric value. In a regression model this value is computed as a square root of xVx where x is a vector of parameter coefficients based on the given predictors and V is the parameters covariance matrix.
clusterId: Select a 1-based index, indicating the position of the predicted cluster in the model. As of PMML 4.1, this result feature is deprecated. It is suggested that entityId is used instead.
clusterAffinity: clusterAffinity is the value of the distance or the similarity depending on the context of the clustering PMML document. Please note that a clustering PMML document producer may output the distance to the nearest center, instead of the cluster center. This specification supports only the distance to the cluster center given in clusterId, NOT the distance to the nearest center. As of PMML 4.1, this result feature is deprecated. It is suggested that affinity is used instead.
entityId: Select the predicted entity defined for each model type: cluster, tree node, neuron,kNN or rule. For tree models, the id of the winning tree node is output; for Clustering models, the 1-based index (implicit identifier) of the winning cluster is output; for kNN models, the value of the case ID variable, if specified, is output, and for Neural Networks, the id of the winning OutputNeuron is output; for RuleSet models, the id of the rule that fired is output, and for Association Rules models, the id of the winning rule(s).
affinity: Select the value of the distance or the similarity of the provided record to the predicted entity as defined in each of the applicable model types.
entityAffinity: entityAffinity was originally reserved for possible use in future versions of PMML. As of PMML 4.1, it is deprecated as the general affinity feature is introduced instead.
warning: Any warning message such as too many missing values.
ruleValue: Select the rule value specified by the attribute ruleFeature. Due to the fact that a rule consists of many different entities (antecedent, consequent, confidence, etc.), this identifier is required to allow for the selection of the desired output. This feature has been deprecated as of PMML 4.2 and the rule features may be selected as regular features.
reasonCode: Select the top-ranked reason code or the reason code specified by the rank value.
antecedent: For Association Rules, select the antecedent of the winning rule (default), or the rule specified by the rank value.
consequent: For Association Rules, select the consequent of the winning rule (default), or the rule specified by the rank value. This output is identical to the predictedValue output.
rule: For Association Rules, select the winning rule (default), or the rule specified by the rank value. This output will return a description of the rule, formatted in the following way: {<antecedent>}->{<consequent>}.
ruleId: For Association Rules, select the id of the winning rule (default), or the rule specified by the rank value. If the selected rule does not provide an id, a 1-based index is returned. This option is identical to the entityId output and it has been deprecated as of PMML 4.2.
confidence: For Association Rules, select the confidence of the winning rule (default), or the rule specified by the rank value. This option is identical to the probability output.
support: For Association Rules, select the support of the winning rule (default), or the rule specified by the rank value.
lift: For Association Rules, select the lift of the winning rule (default), or the rule specified by the rank value.
leverage: For Association Rules, select the leverage of the winning rule (default), or the rule specified by the rank value.

Transformed Values and Decisions

The Output element allows for post-processing of output fields i.e. transforming (raw) predicted values to values that can be better used by humans or other downstream applications. The transformations can take a generic form through the use of an EXPRESSION (see Transformations for more information).

Similarly to transformed values, decisions support post-processing of output fields and are used to describe business problems and the related decisions. The Decisions element is used in conjunction with an EXPRESSION for output fields with result feature decision. Its attribute businessProblem names the problem or question for which a decision is proposed by application of the data mining model. The attribute description can describe the decision problem in more detail.

The Decisions element contains an element Decision for every possible value of the decision. The value is a decision value as returned by the EXPRESSION. The displayValue is a string, which may be used by applications to refer to that decision. The attribute description can describe the decision in more detail.

Examples

Below, a sequence of three examples shows how to use expressions together with OutputFields for post-processing of a predicted value, from simple rescaling to a business decision with the use of a threshold value.

Example 1

Suppose a regression model has the following OutputField elements:

<Output>
  <OutputField name="RawResult" optype="continuous" dataType="double" feature="predictedValue" isFinalResult="false"/>                  
  <OutputField name="FinalResult" optype="continuous" dataType="double" feature="transformedValue" isFinalResult="true">
    <NormContinuous field="RawResult">
      <LinearNorm orig="-100" norm="-304"/>
      <LinearNorm orig="100" norm="324"/>
    </NormContinuous>
  </OutputField> 
</Output>

In essence, this describes the rescaling function f(x) = 10 + 3.14*x in the range between -100 and 100. With a predicted value of 8, the final derived result would be 35.12.

Example 2

<Output>
  <OutputField name="RawResult" optype="continuous" dataType="double" feature="predictedValue"/>  
  <OutputField name="FinalResult" optype="continuous" dataType="double" feature="transformedValue">
    <Apply function="round">
      <NormContinuous field="RawResult">
        <LinearNorm orig="-100" norm="-21.4"/>
        <LinearNorm orig="-10" norm="-21.4"/>
        <LinearNorm orig="10.5" norm="42.97"/>
        <LinearNorm orig="100" norm="42.97"/>
      </NormContinuous>    
    </Apply>
  </OutputField> 
</Output>

In addition to the rescale function from the previous example, we now have upper and lower limits as well as rounding. Suppose the model returns a value of 8. The limits will not show effect, and after rescaling and rounding the final result will be 35. If the predicted value was 12.97, the upper limit would take effect and the maximum value 10.5 would be taken instead. After rescaling and rounding, the final derived result is 43.

Min and max can always be handled using a piecewise linear transformation (NormContinuous). For a lower limit of min and an upper limit of max and a linear behavior governed by f*x + c in between, the following example defines the transformation:

<NormContinuous field="X">
  <LinearNorm orig="VMIN" norm="f*min+c"/>
  <LinearNorm orig="min" norm="f*min+c"/>
  <LinearNorm orig="max" norm="f*max+c"/>
  <LinearNorm orig="VMAX" norm="f*max+c"/>
</NormContinuous>

Note that the following inequalities must hold: VMIN < min < max < VMAX.

Example 3

Building on the previous example, the current one shows how the final result, obtained after the rescale function takes place, can be compared with a given threshold of value 30 to determine a business decision where values greater than 30 yield a positive response.

<Output>
  <OutputField name="RawResult" optype="continuous" dataType="double" feature="predictedValue"/>  
  <OutputField name="FinalResult" optype="continuous" dataType="double" feature="transformedValue">
    <Apply function="round">
      <NormContinuous field="RawResult">
        <LinearNorm orig="-100" norm="-21.4"/>
        <LinearNorm orig="-10" norm="-21.4"/>
        <LinearNorm orig="10.5" norm="42.97"/>
        <LinearNorm orig="100" norm="42.97"/>
      </NormContinuous>    
    </Apply>
  </OutputField>
  <OutputField name="BusinessDecision" optype="categorical" dataType="string" feature="decision">
    <Decisions businessProblem="Should the outstanding amount be collected?" description="The decision depends on the likelihood to get the money and the cost to try.">
      <Decision value="waive" description="Waive any existing conditions on case and approve."/>
      <Decision value="refer" description="Keep conditions and refer case for further scrutiny."/>  
    </Decisions>
    <Apply function="if">
      <Apply function="greaterThan">
        <FieldRef field="FinalResult"/>
        <Constant>30</Constant>
      </Apply>
      <!--THEN-->
      <Constant>waive</Constant>
      <!--ELSE-->     
      <Constant>refer</Constant>
    </Apply>    
  </OutputField>     
</Output>

Outputs Per Model Type

This table shows which outputs are allowed for each type of model defined in PMML. Please note that, as new scoring procedures are added in future releases of PMML, this table can change:

Allowable Outputs based on Model Type (ok = Valid Output, X = Not Applicable)

Model Type	predicted value	transformed value	decision	predicted display value	probability	residual	standard error	entity id	affinity	warning	reason Code	antecedent, consequent, rule, confidence, support, lift, leverage
Association Rules	ok	ok	ok	X	ok	ok	X	X	ok	ok	X	ok
Baseline Model	ok	ok	ok	X	X	X	X	X	X	ok	X	X
Bayesian Network Model	ok	ok	ok	ok	ok	ok	X	X	X	ok	X	X
Clustering Model	ok	ok	ok	ok	X	X	X	ok	ok	ok	X	X
Gaussian Process Model	ok	ok	ok	X	X	ok	X	X	X	ok	X	X
GeneralRegression (regression)	ok	ok	ok	X	X	ok	ok	X	X	ok	X	X
GeneralRegression (classification)	ok	ok	ok	ok	ok	ok	X	X	X	ok	X	X
k-NN (regression)	ok	ok	ok	X	X	X	X	ok	ok	ok	X	X
k-NN (classification)	ok	ok	ok	ok	X	X	X	ok	ok	ok	X	X
k-NN (clustering)	ok	ok	ok	ok	X	X	X	ok	ok	ok	X	X
Naive Bayes	ok	ok	ok	ok	ok	ok	X	X	X	ok	X	X
Neural Network (regression)	ok	ok	ok	X	X	ok	ok	ok	ok	ok	X	X
Neural Network (classification)	ok	ok	ok	ok	ok	ok	X	ok	ok	ok	X	X
Regression (regression)	ok	ok	ok	X	X	ok	ok	X	X	ok	X	X
Regression (classification)	ok	ok	ok	ok	ok	ok	X	X	X	ok	X	X
RuleSet	ok	ok	ok	X	ok	ok	X	ok	ok	ok	X	X
Scorecard	ok	ok	ok	X	X	X	X	X	X	ok	ok	X
Sequence	X	X	X	X	X	X	X	X	X	ok	X	X
Support Vector Machine (regression)	ok	ok	ok	X	X	ok	ok	X	X	ok	X	X
Support Vector Machine (classification)	ok	ok	ok	ok	ok	ok	X	X	X	ok	X	X
Tree (regression)	ok	ok	ok	X	X	ok	X	ok	ok	ok	X	X
Tree (classification)	ok	ok	ok	ok	ok	ok	X	ok	ok	ok	X	X

For mining models using chaining (modelChain), the features specified may be used in accordance with the type of the last model in the chain, unless a segmentId value is provided referring to a specific segment.

e-mail

info at dmg.org