PMML 4.2 - Target Fields and Values

The target values are derived from a variety of elements in the models. For example, the target categories in RegressionModel are specified in the RegressionTable elements, while the TreeModel defines them within Node elements and NaiveBayesModel specifies them in TargetValueCounts. The PMML element Target provides a common syntax for all models.

Example

<Targets>
  <Target field="response" optype="categorical">
    <TargetValue value="YES" displayValue="Yes" priorProbability="0.02"/>
    <TargetValue value="NO" displayValue="No" priorProbability="0.98"/>
  </Target>

  <!-- alternative for continuous field -->
  <Target field="amount" optype="continuous">
    <TargetValue defaultValue="432.21"/>
  </Target>
</Targets>

The example defines a target field named response. It has two categories YES and NO. These values are used in the mining expressions for regression tables, tree nodes, Bayes counts, etc.

Schema

<xs:element name="Targets">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Target" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="Target">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="TargetValue" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="field" type="FIELD-NAME" use="required"/>
    <xs:attribute name="optype" type="OPTYPE"/>
    <xs:attribute name="castInteger">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="round"/>
          <xs:enumeration value="ceiling"/>
          <xs:enumeration value="floor"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>

    <xs:attribute name="min" type="xs:double"/>
    <xs:attribute name="max" type="xs:double"/>
    <xs:attribute name="rescaleConstant" type="xs:double" default="0"/>
    <xs:attribute name="rescaleFactor" type="xs:double" default="1"/>

  </xs:complexType>
</xs:element>

<xs:element name="TargetValue">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Partition" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute name="value" type="xs:string"/>
    <xs:attribute name="displayValue" type="xs:string"/>
    <xs:attribute name="priorProbability" type="PROB-NUMBER"/>
    <xs:attribute name="defaultValue" type="NUMBER"/>
  </xs:complexType>
</xs:element>

The attribute field must refer to a name of a DataField or DerivedField.

When Target specifies optype then it overrides the optype attribute in a corresponding MiningField, if it exists. If the target does not specify optype then the MiningField is used as default. And, in turn, if the MiningField does not specify an optype, it is taken from the corresponding DataField. In other words, a MiningField overrides a DataField, and a Target overrides a MiningField.

If a regression model should predict integers, use the attribute castInteger to control how decimal places should be handled:

round: round to nearest integer, e.g., 2.718 becomes 3, -2.89 becomes -3

ceiling: smallest integer greater than or equal, e.g., 2.718 becomes 3, -1.2 becomes -1

floor: largest integer smaller than or equal, e.g., 2.718 becomes 2, -1.2 becomes -2

If min is present, the predicted value will be the value of min if it is smaller than that.

If max is present, the predicted value will be max if it is larger than that.

rescaleFactor and rescaleConstant can be used for simple rescale of the predicted value: First off, the predicted value is multiplied by rescaleFactor. After that, rescaleConstant is added to the predicted value.

Note that castInteger, min, max, rescaleConstant and rescaleFactor only apply to models of type regression. Furthermore, they must be applied in sequence, which is:

min and max
rescaleFactor
rescaleConstant
castInteger

In classification models, TargetValue is required. For regression models, TargetValue is only optional.

Partition is an optional element to provide distribution information for all records that were assigned to the respective class label.

The attribute value corresponds to the class labels in a classification model. This is, for example, equivalent to categories in RegressionTable, tree Node, neural network NeuralOutput or Bayes TargetValueCounts.
Hence, value defines corresponding values as they were found in the original input data. These values are not normalized or formatted. The attribute displayValue possibly has a transformed, usually more readable version which can be used by PMML consumers to display values in scoring results or other applications. A model might map different values to the same internal target value. E.g., yes and Yes may be mapped to YES. In such cases displayValue is just one representative value, e.g., yes or Yes. Note that the displayValue attributes are not used for identifying a target category within the model.

The attribute priorProbability specifies a default probability for the corresponding target category. It is used if the prediction logic itself did not produce a result. This can happen, e.g., if an input value is missing and there is no other method for treating missing values. The exact rules for using the prior probability are defined in the particular models.

The attribute defaultValue is the counterpart of prior probabilities for continuous fields. Usually the value is the mean of the target values in the training data.

The attribute priorProbability is used only if the optype of the field is categorical or ordinal. The attribute defaultValue is used only if the optype of the field is continuous.

Example
Suppose a regression model has the following Target element:

<Targets>
  <Target field="amount" rescaleConstant="10" rescaleFactor="3.14"/>
</Targets>

In essence, this describes the rescaling function f(x) = 10 + 3.14*x. With a predicted value of 8, the final result would be 35.12.

Example

<Targets>
  <Target field="amount" rescaleConstant="10" rescaleFactor="3.14" min="-10" max="10.5" castInteger="round"/>
</Targets>

In addition to the rescale function from the previous example, we now have upper and lower limits as well as rounding.

Suppose the model returns a value of 8. The limits will not show effect, and after rescaling and rounding the final result will be 35.

If the predicted value was 12.97, the upper limit would take effect and the maximum value 10.5 would be taken instead. After rescaling and rounding, the final result is 43.

Notes

Note that the Schema allows multiple target fields. It depends on the kind of the model whether prediction of multiple fields is supported.

Further notes:

The target categories may be different from the values that appear in the original training data.
The definition of target categories must be a subset, but it may not be a proper subset, of the list of valid values in the DataDictionary.
The same field can have different target specifications in different models. e.g., prior probabilities may be different.

TargetFields are usually declared with usageType="target" in MiningField.

The list of TargetValues within a TargetField is similar to the list of valid values in a DataField. However, the DataField defines the values that are allowed as input to the model, while the TargetValues describe properties of the predicted values in a mining result. The default probabilities and the defaultValues do not necessarily describe statistical properties of a TargetField as found in the training data. For example, the defaultValue can be the mean of the actual values in the training data but it could also be the median or any other value that was chosen during training. The same goes for the default probabilities. They are usually the prior probabilities of respective values in the training data. But they can also be any other adjusted probability.

e-mail

info at dmg.org