PMML 4.0 - Output fields
The output fields describe a set of result values that can be computed by the model.
In particular, the output fields specify names, types and rules for selecting
specific result features.
This information can be used while writing an output table.
The
Output section in the model specifies default names for columns in an output table
and describes how to compute the corresponding values.
Example
<Output>
<OutputField name="P_responseYes" optype="continuous" datatype="xs:double"
targetField="response" feature="probability" value="YES" />
<OutputField name="P_responseNo" optype="continuous" datatype="xs:double"
targetField="response" feature="probability" value="NO" />
<OutputField name="I_response" optype="categorical" datatype="xs:string"
targetField="response" feature="predictedValue" />
<OutputField name="U_response" optype="categorical" datatype="xs:string"
targetField="response" feature="predictedDisplayValue" />
</Output>
|
If a model contains this Output element a PMML consumer could map an input table to
an output table with columns named P_responseYes, P_responseNo, etc.
The values for P_responseYes are determined as the probability that the target field,
with name response has the value YES.
The Schema is
<xs:element name="Output">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="OutputField" minOccurs="1" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="OutputField">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="name" type="FIELD-NAME" use="required" />
<xs:attribute name="displayName" type="xs:string" />
<xs:attribute name="optype" type="OPTYPE" />
<xs:attribute name="dataType" type="DATATYPE"/>
<xs:attribute name="targetField" type="FIELD-NAME" />
<xs:attribute name="feature" type="RESULT-FEATURE" />
<xs:attribute name="value" type="xs:string" />
<xs:attribute name="ruleFeature" type="RULE-FEATURE" default="consequent"/>
<xs:attribute name="algorithm" default="exclusiveRecommendation">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="recommendation" />
<xs:enumeration value="exclusiveRecommendation" />
<xs:enumeration value="ruleAssociation" />
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="rank" type="INT-NUMBER" default="1" />
<xs:attribute name="rankBasis" default="confidence">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="confidence" />
<xs:enumeration value="support" />
<xs:enumeration value="lift" />
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="rankOrder" default="descending">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="descending" />
<xs:enumeration value="ascending" />
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="isMultiValued" default="0" />
</xs:complexType>
</xs:element>
<xs:simpleType name="RESULT-FEATURE">
<xs:restriction base="xs:string">
<xs:enumeration value="predictedValue" />
<xs:enumeration value="predictedDisplayValue" />
<xs:enumeration value="probability" />
<xs:enumeration value="residual" />
<xs:enumeration value="standardError" />
<xs:enumeration value="clusterId" />
<xs:enumeration value="clusterAffinity" />
<xs:enumeration value="entityId" />
<xs:enumeration value="entityAffinity" />
<xs:enumeration value="warning" />
<xs:enumeration value="ruleValue" />
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="RULE-FEATURE">
<xs:restriction base="xs:string">
<xs:enumeration value="antecedent" />
<xs:enumeration value="consequent" />
<xs:enumeration value="rule" />
<xs:enumeration value="ruleId" />
<xs:enumeration value="confidence" />
<xs:enumeration value="support" />
<xs:enumeration value="lift" />
</xs:restriction>
</xs:simpleType>
|
The value of attribute name specifies the name of a new field in the output.
It can be any string.
The name itself does not define how the output values are computed.
The attribute targetField must refer either to a MiningField of type
predicted or to a Target in the Targets section. targetField is a required attribute in case the model predicts multiple fields.
The attribute value contains the displayValue, if applicable.
If the attribute feature is not specified then the output value is a copy of the field value.
If the attribute feature is specified then targetField must refer to a target field,
and the output value is computed from the mining result.
If the attribute value is empty and the value of the attribute feature is probability then the probability of the resulting categorical value should be returned as an output. Otherwise, value indicates the category for which a probability is returned.
The attribute algorithm applies only to Association
Rules models, and specifies which scoring algorithm to use when computing the
output value.
The attribute rank is used to specify which item from a set of outputs should be selected.
Specifically, Association Rules models can return multiple result values as a result of the scoring process.
Therefore, if the result consists of three items, the default behavior (rank="1") is for the first item to be
returned as the result (assuming isMultiValued has a value of "0"; see below for more information about
isMultiValued). To return other items, rank must be set to the appropriate index. The attribute
rankBasis is used to specify which criterion is used to sort the multiple result values. For instance, the results
could be sorted by the confidence, support or lift of the rules. The sorting order is determined by the rankOrder
attribute. The default behavior (rankOrder="descending") indicates that the rules with the highest rank will appear
first on the sorted list. In this case, setting rank="1" would return the first rule from the sorted list, which
would be the highest ranked rule.
The attribute isMultiValued indicates that the output can represent multiple output values.
If the value of the attribute is "1", then the rank value indicates the number of output values that should
be returned - a positive value indicates the number of output values to be returned, based on the rankBasis and
rankOrder, while a zero value indicates that all output values are to be returned. If the value of isMultiValued
is "0" (default), then rank indicates a particular output, as defined above.
The meaning of the feature identifier is:
predictedValue: Select the raw predicted value, aka target value. More than
one OutputField element can have the predictedValue feature only if the model
predicts more than one field.
predictedDisplayValue: Select the display value that corresponds to the raw predicted value.
The display value can be specified in the element Target.
If it is not specified explicitly, then the raw predicted value is used by default.
probability: Select the probability of the target value as given by the attribute value.
The target value corresponds to, e.g., values in attribute targetCategory in element
RegressionTable or values in attribute value in element ScoreDistribution
for tree classification.
That is, these values can be display values. The corresponding original
values can be found in the Target elements.
Attribute value in element Target matches value in OutputField,
displayValue in Target is the original value.
residual: Select the residual of the target value.
For numeric prediction this is the actual value minus the predicted value.
For classification this is [actual value = target value] minus the predicted probability for
the target value.
The attribute value specifies the raw target value.
The term [actual value = target value] is defined as 1.0 if
the actual value is the same as the target value, and 0.0 otherwise.
standardError: Select the standard error of the predicted numeric value. In a regression model this
value is computed as a square root of xVx where x is a vector of parameter
coefficients based on the given predictors and V is the parameters covariance matrix.
clusterId: Indicates that this field is the ID of the predicted cluster.
clusterAffinity: clusterAffinity is the value of the distance or the similarity depending on
the context of the clustering PMML document. Please note that a clustering
PMML document producer may output the distance to the nearest center, instead
of the cluster center. This specification supports only the distance to the
cluster center given in clusterId, NOT the distance to the nearest center.
entityId: Similar to clusterId, indicates that the ID of the predicted cluster, tree node, neuron or rule.
This is a more generalized feature than clusterID (which is only applicable to cluster models). For tree models,
the id of the winning tree node is output; for Neural Networks, the id of the winning OutputNeuron is output;
for RuleSet models, the id of the rule that fired is output, and for Association Rules models, the id of the
winning rule(s).
entityAffinity: entityAffinity is reserved for possible use in future versions of PMML.
warning: Any warning message such as too many missing values.
ruleValue: Select the rule value specified
by the attribute ruleFeature. Due to the fact that a rule consists of
many different entities (antecedent, consequent, confidence, etc.), this identifier is required to allow for the selection of the desired
output.
The meaning of the ruleFeature attribute, when feature
is set to ruleValue, is:
antecedent: Select the antecedent of the
winning rule(s) (default), or the rule specified by the rank value.
consequent: Select the consequent of the winning rule(s) (default), or the
rule specified by the rank value. This output is identical to the
predictedValue output.
rule: Select the winning rule(s) (default), or the rule specified by the
rank value. This output will return a description of the rule, formatted
in the following way: {<antecedent>}->{<consequent>}.
ruleId: Select the id of the winning rule(s) (default), or the rule
specified by the rank value. If the selected rule does not provide an
id, a 1-based index is returned. This option is identical to the
entityId output.
confidence: Select the confidence of the winning rule(s) (default), or the
rule specified by the rank value. This option is identical to the
probability output.
support: Select the support of the winning rule(s) (default), or the rule
specified by the rank value.
lift: Select the lift of the winning rule(s) (default), or the rule
specified by the rank value.
This table shows which outputs are allowed for each type of model defined in PMML. Please note that, as new scoring procedures are added in future releases of PMML, this table can change:
Allowable Outputs based on Model Type
(ok = Valid Output, X = Not Applicable)
Model Type |
predicted Value |
predicted Display Value |
probability |
residual |
standard Error |
clusterId |
cluster Affinity |
entityID |
entity Affinity |
warning |
ruleValue |
Association Rules
| ok |
X |
ok |
X |
X |
X |
X |
ok |
X |
ok |
ok |
Clustering Model
| ok |
ok |
X |
X |
X |
ok |
ok |
ok |
ok |
ok |
X |
GeneralRegression (regression)
| ok |
X |
X |
ok |
ok |
X |
X |
X |
X |
ok |
X |
GeneralRegression (classification)
| ok |
ok |
ok |
ok |
ok |
X |
X |
X |
X |
ok |
X |
Regression (regression)
| ok |
ok |
X |
ok |
ok |
X |
X |
X |
X |
ok |
X |
Regression (classification)
| ok |
ok |
ok |
ok |
ok |
X |
X |
X |
X |
ok |
X |
Naïve Bayes
| ok |
ok |
ok |
ok |
X |
X |
X |
X |
X |
ok |
X |
Neural Network (regression)
| ok |
ok |
X |
ok |
ok |
X |
X |
ok |
ok |
ok |
X |
Neural Network (classification)
| ok |
ok |
ok |
ok |
ok |
X |
X |
ok |
ok |
ok |
X |
RuleSet
| ok |
ok |
ok |
ok |
X |
X |
X |
ok |
ok |
ok |
X |
Sequence
| X |
X |
X |
X |
X |
X |
X |
X |
X |
ok |
X |
Support Vector Machine (regression)
| ok |
ok |
X |
ok |
ok |
X |
X |
X |
X |
ok |
X |
Support Vector Machine (classification)
| ok |
ok |
ok |
ok |
ok |
X |
X |
X |
X |
ok |
X |
Tree (regression)
| ok |
ok |
X |
ok |
X |
X |
X |
ok |
ok |
ok |
X |
Tree (classification)
| ok |
ok |
ok |
ok |
X |
X |
X |
ok |
ok |
ok |
X |
For Model Composition, the Output values should reflect those of the last model in the calculation.
Note that the feature identifier
residual is useful only if the model
is used on test data that contains target values.
It is straightforward to compute the residual on numeric data as the prediction error.
The residual is based on differences of probability values in the case of categorical data.
For example, assume a classification model to predict the labels
Y and
N.
For some row in the test data the actual value may be
Y and the predicted value is
Y with a probability of 0.8.
The term
[actual value = target value] maps to
1.0 and the residual is the difference between
1.0 and the probability,
i.e.
1.0-0.8 = 0.2.
For some other row the actual value may be
N. Assuming the predicted value and probability are the same as before
we have
[actual value = target value] = 0.0 and
residual = 0.0 - 0.8 = -0.8.