PMML 4.2 - Output fields
Output
element describes a set
of result values that can be returned from a model. In particular,
OutputField
elements specify names, types and rules for calculating
specific result features. This information can be used while writing an
output table. The
Output
section in the model specifies names for
columns in an output table and describes how to compute the corresponding
values.
Example
<Output>
<OutputField name="P_responseYes" optype="continuous" dataType="double" targetField="response" feature="probability" value="YES"/>
<OutputField name="P_responseNo" optype="continuous" dataType="double" targetField="response" feature="probability" value="NO"/>
<OutputField name="I_response" optype="categorical" dataType="string" targetField="response" feature="predictedValue"/>
<OutputField name="U_response" optype="categorical" dataType="string" targetField="response" feature="predictedDisplayValue"/>
</Output>
If a model contains this Output
element a PMML consumer could map
an input table to an output table with columns named P_responseYes
,
P_responseNo
, etc. The values for P_responseYes
are determined
as the probability that the target field, with name response
has the
value YES
.
Schema:
<xs:element name="Output">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="OutputField" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="OutputField">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:sequence minOccurs="0" maxOccurs="1">
<xs:element ref="Decisions" minOccurs="0" maxOccurs="1"/>
<xs:group ref="EXPRESSION" minOccurs="1" maxOccurs="1"/>
</xs:sequence>
</xs:sequence>
<xs:attribute name="name" type="FIELD-NAME" use="required"/>
<xs:attribute name="displayName" type="xs:string"/>
<xs:attribute name="optype" type="OPTYPE"/>
<xs:attribute name="dataType" type="DATATYPE"/>
<xs:attribute name="targetField" type="FIELD-NAME"/>
<xs:attribute name="feature" type="RESULT-FEATURE" default="predictedValue"/>
<xs:attribute name="value" type="xs:string"/>
<xs:attribute name="ruleFeature" type="RULE-FEATURE" default="consequent"/>
<xs:attribute name="algorithm" default="exclusiveRecommendation">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="recommendation"/>
<xs:enumeration value="exclusiveRecommendation"/>
<xs:enumeration value="ruleAssociation"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="rank" type="INT-NUMBER" default="1"/>
<xs:attribute name="rankBasis" default="confidence">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="confidence"/>
<xs:enumeration value="support"/>
<xs:enumeration value="lift"/>
<xs:enumeration value="leverage"/>
<xs:enumeration value="affinity"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="rankOrder" default="descending">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="descending"/>
<xs:enumeration value="ascending"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="isMultiValued" default="0"/>
<xs:attribute name="segmentId" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:simpleType name="RESULT-FEATURE">
<xs:restriction base="xs:string">
<xs:enumeration value="predictedValue"/>
<xs:enumeration value="predictedDisplayValue"/>
<xs:enumeration value="transformedValue"/>
<xs:enumeration value="decision"/>
<xs:enumeration value="probability"/>
<xs:enumeration value="affinity"/>
<xs:enumeration value="residual"/>
<xs:enumeration value="standardError"/>
<xs:enumeration value="clusterId"/>
<xs:enumeration value="clusterAffinity"/>
<xs:enumeration value="entityId"/>
<xs:enumeration value="entityAffinity"/>
<xs:enumeration value="warning"/>
<xs:enumeration value="ruleValue"/>
<xs:enumeration value="reasonCode"/>
<xs:enumeration value="antecedent"/>
<xs:enumeration value="consequent"/>
<xs:enumeration value="rule"/>
<xs:enumeration value="ruleId"/>
<xs:enumeration value="confidence"/>
<xs:enumeration value="support"/>
<xs:enumeration value="lift"/>
<xs:enumeration value="leverage"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="Decisions">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="Decision" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="businessProblem" type="xs:string"/>
<xs:attribute name="description" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:element name="Decision">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="value" type="xs:string" use="required"/>
<xs:attribute name="displayValue" type="xs:string"/>
<xs:attribute name="description" type="xs:string"/>
</xs:complexType>
</xs:element>
<xs:simpleType name="RULE-FEATURE">
<xs:restriction base="xs:string">
<xs:enumeration value="antecedent"/>
<xs:enumeration value="consequent"/>
<xs:enumeration value="rule"/>
<xs:enumeration value="ruleId"/>
<xs:enumeration value="confidence"/>
<xs:enumeration value="support"/>
<xs:enumeration value="lift"/>
<xs:enumeration value="leverage"/>
<xs:enumeration value="affinity"/>
</xs:restriction>
</xs:simpleType>
The attribute name
specifies the name of a the
OutputField
. The name itself does not define how the output values
are computed. For information on the naming of OutputField
s, see
Scope of Fields.
The dataType
of an OutputField
element specifies the
default column type. opType
can be used to indicate admissible
operations on the values. A clusterId
field, for example, can have
integer
as its dataType, but categorical
as its opType. For
details, see the description of DataDictionary
.
If present, the
attribute targetField
must refer either to a
MiningField
of usage type target
. targetField
is a required attribute in
case the model has multiple target fields.
The attribute feature
specifies the value the output field
takes from the computed mining result.
The attribute value
is used in
conjunction with result features referring to specific values. For
example, when used with the feature probability
, the attribute value
indicates the category for which a
probability is returned. When not specified, the probability
of the predicted categorical value should be returned as an output.
An output field may contain an association rule or any of its properties.
ruleFeature
specifies which feature of an association rule to
return. This attribute has been deprecated as of PMML 4.2.
The rule feature values can now be specified in the feature
attribute.
The attribute algorithm
specifies which scoring algorithm to use when computing the output
value. It applies only to Association Rules models.
The attribute rank
is used to specify the rank of the feature value from
the mining result that should be selected. It can be used with models that rank the possible outcomes (e.g. classification,
clustering, score cards, association rules). When not
specified, the winning outcome is returned. When specified, the outcome of the particular rank is returned
instead. For example, scorecards can return multiple reason codes depending on the complexity of
the scorecard. If the scorecard computes
multiple reason codes,
rank="1"
returns the top reason code. To return other reason codes,
rank
must be set to the appropriate index. Note that the attribute rank
cannot be used together with the attribute value
.
The attribute rankOrder
determines the sorting order
when ranking the results. The default behavior (rankOrder="descending")
indicates that the result with the highest rank will appear first on the
sorted list.
The attribute
rankBasis
applies only to Association Rules and
is used to specify which criterion is used to sort the
output result. For instance, the result
could be sorted by the confidence, support or lift of the rules.
The attribute isMultiValued
indicates that the output can represent
multiple output values. This attribute has been deprecated as of PMML 4.2.
If the value of the attribute is "1", then the rank
value indicates the number
of output values that should be returned - a positive value indicates the number of output values to be
returned, based on the rankBasis
and rankOrder
, while a zero
value indicates that all output values are to be returned. If the value of
isMultiValued
is "0" (default), then rank
indicates a
particular output, as defined above.
The attribute segmentId
is applicable to MiningModel
s
which utilize Segmentation
. This attribute provides an
approach to deliver results from Segment
s which avoids having to
specify Outputs
within each Segment
. If the
segmentId
attribute matches the id
attribute of a
Segment
within the scope of this model element (the model element
containing this OutputField
's parent Outputs
element), and
if the predicate of that segment is true
, then this
OutputField
returns the specified result feature from the sub-model
contained within the specified segment. If there is no Segment
matching segmentId
or if the predicate of the matching
Segment
evaluated to false
, then by convention the result
delivered by this OutputField
is missing. The features
decision
or transformedValue
may be used along with a segmentId
,
to refer to an output value of the particular segment. In this case, the attribute
value
must also be specified and it must refer to the name of the output field
of the segment for the decision or transformed value of interest. In addition, the output field
of the mining model should not specify an EXPRESSION
.
Result Features
The meaning of the feature
identifier is:
predictedValue
- Select the raw predicted value. More than one
OutputField
element can have the predictedValue
feature
only if the model predicts more than one field. Details can be found in the
description of the individual models.
predictedDisplayValue
- Select the display value that corresponds to
the raw predicted value. For most models, the display value can be
specified in the element
Target
. If it is not specified
explicitly, then the raw predicted value is used by default.
transformedValue
- Apply some transformation to the value of an
output field. For this result feature,
OutputField
must contain an
EXPRESSION
, unless it is used to refer to
a transformed value of a segment model through the segmentID
attribute.
The transformed value is the result of the evaluation
of the EXPRESSION
. Often, the transformed value represents a shift
of the value into a different scale. Scaling is commonly used to transform
a value into a value that is more friendly to humans, for example,
transforming values 0 to 1 into 0 to 100. Another common use case is the
re-transformation of a value that has been scaled by the algorithm before
the calculations in order to present the output in the original scale
rather than the transformed one. A typical example is a logarithmic
transformation, where, instead of a target value x, its logarithm log(x) is
used. When presenting predicted values v, they will have to be
re-transformed as exp(v).
decision
- Derive a decision from the output of a data mining model.
For this result feature,
OutputField
must contain an
EXPRESSION
, unless it is used to refer to
a decision of segment model through the segmentID
attribute.
The decision is the result of the evaluation of the
EXPRESSION
. If the result feature is decision
all
possible values resulting from the evaluation of EXPRESSION
must
be introduced using Decision
elements.
probability
- Select the probability of the target value as given by
the attribute
value
. The value corresponds to one of the target categories of the model.
residual
- Select the residual of the target value. For numeric
prediction this is the actual value minus the predicted value. For
classification this is [actual value = target value] minus the
predicted probability for the target value. The attribute
value
specifies the raw target value. The term [actual value = target
value] is defined as 1.0 if the actual value is the same as the target
value, and 0.0 otherwise. Note in order for residual values to be computed the input data
must include target values.
standardError
- Select the standard error of the predicted numeric
value. In a regression model this value is computed as a square root of
xVx where x is a vector of parameter coefficients based on
the given predictors and V is the parameters covariance
matrix.
clusterId
- Select a 1-based index, indicating the position of the
predicted cluster in the model. As of PMML 4.1, this result feature is
deprecated. It is suggested that
entityId
is used instead.
clusterAffinity
clusterAffinity
is the value of the
distance or the similarity depending on the context of the clustering PMML
document. Please note that a clustering PMML document producer may output
the distance to the nearest center, instead of the cluster center. This
specification supports only the distance to the cluster center given in
clusterId
, NOT the distance to the nearest center. As of PMML 4.1,
this result feature is deprecated. It is suggested that affinity
is used instead.
entityId
- Select the predicted entity defined for each model type:
cluster, tree node, neuron or rule. For tree models, the id of the winning
tree node is output; for Clustering models, the 1-based index (implicit
identifier) of the winning cluster is output; for Neural Networks, the id
of the winning OutputNeuron is output; for RuleSet models, the id of the
rule that fired is output, and for Association Rules models, the id of the
winning rule(s).
affinity
- Select the value of the distance or the similarity of the
provided record to the predicted entity as defined in each of the
applicable model types.
entityAffinity
entityAffinity
was originally reserved for
possible use in future versions of PMML. As of PMML 4.1, it is deprecated
as the general affinity
feature is introduced instead.
warning
- Any warning message such as too many missing
values.
ruleValue
- Select the rule value specified by the attribute
ruleFeature
. Due to the fact that a rule consists of many
different entities (antecedent, consequent, confidence, etc.), this
identifier is required to allow for the selection of the desired
output. This feature has been deprecated as of PMML 4.2
and the rule features may be selected as regular features.
reasonCode
- Select the top-ranked reason code or the reason
code specified by the rank value.
antecedent
- For Association Rules, select the antecedent of the winning rule (default),
or the rule specified by the rank value.
consequent
- For Association Rules, select
the consequent of the winning rule (default),
or the rule specified by the rank value. This output is identical to the
predictedValue
output.
rule
- For Association Rules, select the winning rule (default), or the rule specified by
the rank value. This output will return a description of the rule,
formatted in the following way:
{<antecedent>}->{<consequent>}.
ruleId
- For Association Rules, select the id of the winning rule (default), or the rule
specified by the rank value. If the selected rule does not provide an id, a
1-based index is returned. This option is identical to the entityId
output and it has been deprecated as of PMML 4.2.
confidence
- For Association Rules, select the confidence of the winning rule (default),
or the rule specified by the rank value. This option is identical to the
probability output.
support
- For Association Rules, select the support of the winning rule (default), or the
rule specified by the rank value.
lift
- For Association Rules, select the lift of the winning rule (default), or the rule
specified by the rank value.
leverage
- For Association Rules, select the leverage of the winning rule (default), or
the rule specified by the rank value.
Transformed Values and Decisions
The Output
element allows for post-processing of output fields
i.e. transforming (raw) predicted values to values that can be better used by humans
or other downstream applications. The transformations can take a generic form through the use of
an EXPRESSION
(see Transformations for more information).
Similarly to transformed values, decisions support post-processing of output fields
and are used to describe business problems and the related decisions. The Decisions
element is used in conjunction with an EXPRESSION
for
output fields with result feature decision
. Its attribute
businessProblem
names the problem or question for which a decision is
proposed by application of the data mining model. The attribute
description
can describe the decision problem in more detail.
The Decisions
element contains an element Decision
for every possible value
of the decision. The value
is a decision value as returned by the
EXPRESSION
. The displayValue
is a string, which may be used
by applications to refer to that decision. The attribute description
can describe the decision in more detail.
Examples
Below, a sequence of three examples shows how to use
expressions together with OutputField
s for post-processing of a
predicted value, from simple rescaling to a business decision with the use of
a threshold value.
Example 1
Suppose a regression model has the following OutputField
elements:
<Output>
<OutputField name="RawResult" optype="continuous" dataType="double" feature="predictedValue"/>
<OutputField name="FinalResult" optype="continuous" dataType="double" feature="transformedValue">
<NormContinuous field="RawResult">
<LinearNorm orig="-100" norm="-304"/>
<LinearNorm orig="100" norm="324"/>
</NormContinuous>
</OutputField>
</Output>
In essence, this describes the rescaling function f(x) = 10 +
3.14*x in the range between -100 and 100. With a predicted value of
8, the final derived result would be 35.12.
Example 2
<Output>
<OutputField name="RawResult" optype="continuous" dataType="double" feature="predictedValue"/>
<OutputField name="FinalResult" optype="continuous" dataType="double" feature="transformedValue">
<Apply function="round">
<NormContinuous field="RawResult">
<LinearNorm orig="-100" norm="-21.4"/>
<LinearNorm orig="-10" norm="-21.4"/>
<LinearNorm orig="10.5" norm="42.97"/>
<LinearNorm orig="100" norm="42.97"/>
</NormContinuous>
</Apply>
</OutputField>
</Output>
In addition to the rescale function from the previous example, we now have
upper and lower limits as well as rounding. Suppose the model returns a value
of 8. The limits will not show effect, and after rescaling and
rounding the final result will be 35. If the predicted value was
12.97, the upper limit would take effect and the maximum value
10.5 would be taken instead. After rescaling and rounding, the final
derived result is 43.
Min and max can always be handled using a piecewise linear transformation
(NormContinuous)
. For a lower limit of min and an upper limit
of max and a linear behavior governed by f*x + c in between,
the following example defines the transformation:
<NormContinuous field="X">
<LinearNorm orig="VMIN" norm="f*min+c"/>
<LinearNorm orig="min" norm="f*min+c"/>
<LinearNorm orig="max" norm="f*max+c"/>
<LinearNorm orig="VMAX" norm="f*max+c"/>
</NormContinuous>
Note that the following inequalities must hold: VMIN < min < max
< VMAX
.
Example 3
Building on the previous example, the current one shows how the final
result, obtained after the rescale function takes place, can be compared with
a given threshold of value 30 to determine a business decision where
values greater than 30 yield a positive response.
<Output>
<OutputField name="RawResult" optype="continuous" dataType="double" feature="predictedValue"/>
<OutputField name="FinalResult" optype="continuous" dataType="double" feature="transformedValue">
<Apply function="round">
<NormContinuous field="RawResult">
<LinearNorm orig="-100" norm="-21.4"/>
<LinearNorm orig="-10" norm="-21.4"/>
<LinearNorm orig="10.5" norm="42.97"/>
<LinearNorm orig="100" norm="42.97"/>
</NormContinuous>
</Apply>
</OutputField>
<OutputField name="BusinessDecision" optype="categorical" dataType="string" feature="decision">
<Decisions businessProblem="Should the outstanding amount be collected?" description="The decision depends on the likelihood to get the money and the cost to try.">
<Decision value="waive" description="Waive any existing conditions on case and approve."/>
<Decision value="refer" description="Keep conditions and refer case for further scrutiny."/>
</Decisions>
<Apply function="if">
<Apply function="greaterThan">
<FieldRef field="FinalResult"/>
<Constant>30</Constant>
</Apply>
<!--THEN-->
<Constant>waive</Constant>
<!--ELSE-->
<Constant>refer</Constant>
</Apply>
</OutputField>
</Output>
Outputs Per Model Type
This table shows which outputs are allowed for each type of model defined
in PMML. Please note that, as new scoring procedures are added in future
releases of PMML, this table can change:
Allowable Outputs based on Model Type (ok = Valid Output, X = Not
Applicable)
Model Type |
predicted value |
transformed value |
decision |
predicted display value |
probability |
residual |
standard error |
entity id |
affinity |
warning |
reason Code |
antecedent, consequent, rule, confidence, support, lift, leverage |
Association Rules |
ok |
ok |
ok |
X |
ok |
ok |
X |
X |
ok |
ok |
X |
ok |
Baseline Model |
ok |
ok |
ok |
X |
X |
X |
X |
X |
X |
ok |
X |
X |
Clustering Model |
ok |
ok |
ok |
ok |
X |
X |
X |
ok |
ok |
ok |
X |
X |
GeneralRegression (regression) |
ok |
ok |
ok |
X |
X |
ok |
ok |
X |
X |
ok |
X |
X |
GeneralRegression (classification) |
ok |
ok |
ok |
ok |
ok |
ok |
X |
X |
X |
ok |
X |
X |
k-NN (regression) |
ok |
ok |
ok |
X |
X |
X |
X |
ok |
ok |
ok |
X |
X |
k-NN (classification) |
ok |
ok |
ok |
ok |
X |
X |
X |
ok |
ok |
ok |
X |
X |
k-NN (clustering) |
ok |
ok |
ok |
ok |
X |
X |
X |
ok |
ok |
ok |
X |
X |
Naive Bayes |
ok |
ok |
ok |
ok |
ok |
ok |
X |
X |
X |
ok |
X |
X |
Neural Network (regression) |
ok |
ok |
ok |
X |
X |
ok |
ok |
ok |
ok |
ok |
X |
X |
Neural Network (classification) |
ok |
ok |
ok |
ok |
ok |
ok |
X |
ok |
ok |
ok |
X |
X |
Regression (regression) |
ok |
ok |
ok |
X |
X |
ok |
ok |
X |
X |
ok |
X |
X |
Regression (classification) |
ok |
ok |
ok |
ok |
ok |
ok |
X |
X |
X |
ok |
X |
X |
RuleSet |
ok |
ok |
ok |
X |
ok |
ok |
X |
ok |
ok |
ok |
X |
X |
Scorecard |
ok |
ok |
ok |
X |
X |
X |
X |
X |
X |
ok |
ok |
X |
Sequence |
X |
X |
X |
X |
X |
X |
X |
X |
X |
ok |
X |
X |
Support Vector Machine (regression) |
ok |
ok |
ok |
X |
X |
ok |
ok |
X |
X |
ok |
X |
X |
Support Vector Machine (classification) |
ok |
ok |
ok |
ok |
ok |
ok |
X |
X |
X |
ok |
X |
X |
Tree (regression) |
ok |
ok |
ok |
X |
X |
ok |
X |
ok |
ok |
ok |
X |
X |
Tree (classification) |
ok |
ok |
ok |
ok |
ok |
ok |
X |
ok |
ok |
ok |
X |
X |
For mining models using chaining (modelChain
), the features specified may be used
in accordance with the type of the last model in the chain, unless a segmentId
value is provided
referring to a specific segment.