DMG logo PMML 4.4 - Scorecard
PMML4.4 Menu

Home

Changes

XML Schema

Conformance

Interoperability

General Structure

Field Scope

Header

Data
Dictionary


Mining
Schema


Transformations

Statistics

Taxomony

Targets

Output

Functions

Built-in Functions

Model Verification

Model Explanation

Multiple Models

Anomaly Detection
Models


Association Rules

Baseline Models

Bayesian Network

Cluster
Models


Gaussian
Process


General
Regression


k-Nearest
Neighbors


Naive
Bayes


Neural
Network


Regression

Ruleset

Scorecard

Sequences

Text Models

Time Series

Trees

Vector Machine

PMML 4.4 - Scorecard

A data mining model contains a set of input fields which are used to predict a certain target value. This prediction can be seen as an assessment about a prospect, a customer, or a scenario for which an outcome is predicted based on historical data. In a scorecard, input fields, also referred to as characteristics (for example, "age"), are broken down into attributes (for example, "19-29" and "30-39" age groups or ranges) with specific partial scores associated with them. These scores represent the influence of the input attributes on the target and are readily available for inspection. Partial scores are then summed up so that an overall score can be obtained for the target value.

Scorecards are very popular in the financial industry for their interpretability and ease of implementation, and because input attributes can be mapped to a series of reason codes which provide explanations of each individual's score. Usually, the lower the overall score produced by a scorecard, the higher the chances of it triggering an adverse decision, which usually involves the referral or denial of services. Reason codes, as the name suggests, allow for an explanation of scorecard behavior and any adverse decisions generated as a consequence of the overall score. They basically answer the question: "Why is the score low, given its input conditions?" (For inverted scoring ranges, this specification also provides for the option of returning reason codes for scores which are "too high". See section Scoring Procedure.)

The XML Schema for Scorecard

<xs:element name="Scorecard">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="MiningSchema"/>
      <xs:element ref="Output" minOccurs="0"/>
      <xs:element ref="ModelStats" minOccurs="0"/>
      <xs:element ref="ModelExplanation" minOccurs="0"/>
      <xs:element ref="Targets" minOccurs="0"/>
      <xs:element ref="LocalTransformations" minOccurs="0"/>
      <xs:element ref="Characteristics"/>
      <xs:element ref="ModelVerification" minOccurs="0"/>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="modelName" type="xs:string"/>
    <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/>
    <xs:attribute name="algorithmName" type="xs:string"/>
    <xs:attribute name="initialScore" type="NUMBER" default="0"/>
    <xs:attribute name="useReasonCodes" type="xs:boolean" default="true"/>
    <xs:attribute name="reasonCodeAlgorithm" default="pointsBelow"> 
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="pointsAbove"/>
          <xs:enumeration value="pointsBelow"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
    <xs:attribute name="baselineScore" type="NUMBER"/>
    <xs:attribute name="baselineMethod" default="other">
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="max"/>
          <xs:enumeration value="min"/>
          <xs:enumeration value="mean"/>
          <xs:enumeration value="neutral"/>
          <xs:enumeration value="other"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:attribute>
    <xs:attribute name="isScorable" type="xs:boolean" default="true"/>
  </xs:complexType>
</xs:element>

<xs:element name="Characteristics">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Characteristic" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

Definitions

Note that if useReasonCodes is "true", then baselineScore must be defined at the Scorecard level or for each Characteristic, and reasonCode must be provided for each Characteristic or for each of its input Attributes. If useReasonCodes is "false", then baselineScore and reasonCode are not required.

Example

The following sample scorecard is used to compute the overall score associated with three input characteristics: "department", "age", and "income".

Partial scores for categorical characteristic "department"
Attribute Partial Score
if value is missing -9
marketing 19
engineering 3
business 6
Partial scores for numeric characteristic "age"
Attribute Partial Score
if value is missing -1
0-18 -3
19-29 0
30-39 12
40- 18
Partial scores and reason codes for numeric characteristic "income"
Attribute Partial Score
if value is missing 3
less or equal to 1000 (0.03 * income) + 11
greater than 1000 and less than or equal to 1500 5
greater than 1500 (0.01 * income) - 18

The corresponding PMML model is:

<PMML xmlns="http://www.dmg.org/PMML-4_4" version="4.4">
  <Header copyright="www.dmg.org" description="Sample scorecard">
    <Timestamp>2010-11-10T08:17:10.8</Timestamp>
  </Header>
  <DataDictionary>
    <DataField name="department" dataType="string" optype="categorical"/>
    <DataField name="age" dataType="integer" optype="continuous"/>
    <DataField name="income" dataType="double" optype="continuous"/>
    <DataField name="overallScore" dataType="double" optype="continuous"/>
  </DataDictionary>
  <Scorecard modelName="SampleScorecard" functionName="regression" useReasonCodes="true" reasonCodeAlgorithm="pointsBelow" initialScore="0" baselineMethod="other">
    <MiningSchema>
      <MiningField name="department" usageType="active" invalidValueTreatment="asMissing"/>
      <MiningField name="age" usageType="active" invalidValueTreatment="asMissing"/>
      <MiningField name="income" usageType="active" invalidValueTreatment="asMissing"/>
      <MiningField name="overallScore" usageType="target"/>
    </MiningSchema>
    <Output>
      <OutputField name="Final Score" feature="predictedValue" dataType="double" optype="continuous"/>
      <OutputField name="Reason Code 1" rank="1" feature="reasonCode" dataType="string" optype="categorical"/>
      <OutputField name="Reason Code 2" rank="2" feature="reasonCode" dataType="string" optype="categorical"/>
      <OutputField name="Reason Code 3" rank="3" feature="reasonCode" dataType="string" optype="categorical"/>
    </Output>
    <Characteristics>
      <Characteristic name="departmentScore" reasonCode="RC1" baselineScore="19">
        <Attribute partialScore="-9">
          <SimplePredicate field="department" operator="isMissing"/>
        </Attribute>
        <Attribute partialScore="19">
          <SimplePredicate field="department" operator="equal" value="marketing"/>
        </Attribute>
        <Attribute partialScore="3">
          <SimplePredicate field="department" operator="equal" value="engineering"/> 
        </Attribute>
        <Attribute partialScore="6">
          <SimplePredicate field="department" operator="equal" value="business"/> 
        </Attribute> 
        <Attribute partialScore="0">
          <True/> 
        </Attribute> 
      </Characteristic>
      <Characteristic name="ageScore" reasonCode="RC2" baselineScore="18">
        <Attribute partialScore="-1">
          <SimplePredicate field="age" operator="isMissing"/>
        </Attribute>
        <Attribute partialScore="-3">
          <SimplePredicate field="age" operator="lessOrEqual" value="18"/>
        </Attribute>
        <Attribute partialScore="0">
          <CompoundPredicate booleanOperator="and">
            <SimplePredicate field="age" operator="greaterThan" value="18"/>
            <SimplePredicate field="age" operator="lessOrEqual" value="29"/>
          </CompoundPredicate>
        </Attribute>
        <Attribute partialScore="12">
          <CompoundPredicate booleanOperator="and">
            <SimplePredicate field="age" operator="greaterThan" value="29"/>
            <SimplePredicate field="age" operator="lessOrEqual" value="39"/>
          </CompoundPredicate>
        </Attribute> 
        <Attribute partialScore="18">
          <SimplePredicate field="age" operator="greaterThan" value="39"/>
        </Attribute>
      </Characteristic>
      <Characteristic name="incomeScore" reasonCode="RC3" baselineScore="10">
        <Attribute partialScore="3">
          <SimplePredicate field="income" operator="isMissing"/>
        </Attribute>
        <Attribute>             
          <SimplePredicate field="income" operator="lessOrEqual" value="1000"/>
          <ComplexPartialScore>
            <Apply function="+">
              <Apply function="*">
                <Constant>0.03</Constant>
                <FieldRef field="income"/>
              </Apply>
              <Constant>11</Constant>
            </Apply>
          </ComplexPartialScore>                  
        </Attribute>        
        <Attribute partialScore="5">
          <CompoundPredicate booleanOperator="and">
            <SimplePredicate field="income" operator="greaterThan" value="1000"/>
            <SimplePredicate field="income" operator="lessOrEqual" value="2500"/>
          </CompoundPredicate>
        </Attribute> 
        <Attribute>
          <SimplePredicate field="income" operator="greaterThan" value="1500"/>
          <ComplexPartialScore>
            <Apply function="-">
              <Apply function="*">
                <Constant>0.01</Constant>
                <FieldRef field="income"/>
              </Apply>
              <Constant>18</Constant>
            </Apply>
          </ComplexPartialScore>  
        </Attribute>
      </Characteristic>
    </Characteristics>
  </Scorecard>
</PMML>

Note that Characteristic "departmentScore" encapsulates an element Attribute which is set to TRUE at all times. When used as part of the last Attribute in the list of attributes for a certain characteristic, PREDICATE True allows for a default partial score to be used in case no other Attribute/PREDICATE for that characteristic evaluates to TRUE.

In addition, note that for this scorecard, a baselineScore is associated with each Characteristic. Also note that a single reasonCode is associated with each Characteristic. While reason code "RC1" is associated with Characteristic "departmentScore", reason codes "RC2" and RC3" are associated with numeric characteristics "ageScore" and "incomeScore", respectively.

However, if the scorecard requires specific reason codes to be used per range or category of a characteristic, attribute reasonCode of element Attribute should be used instead. In this case, it takes precedence over attribute reasonCode of element Characteristic. For example, the table below shows a different reason code ("RC2_1" through "RC2_5") for each age range.

Attribute-based reason codes for characteristic "age"
Attribute Partial Score Reason Code
if value is missing -1 RC2_1
0-18 -3 RC2_2
19-29 0 RC2_3
30-39 12 RC2_4
40- 18 RC2_5

In order to represent such a requirement, specific reason codes per Attribute element are now defined for Characteristic "ageScore" as show in the PMML code below.

<Scorecard modelName="SampleScorecard" functionName="regression" 
    useReasonCodes="true" reasonCodeAlgorithm="pointsBelow" 
    initialScore="0" baselineMethod="other">
   ...
   <Characteristic name="ageScore" baselineScore="18">
      <Attribute partialScore="-1" reasonCode="RC2_1">
         <SimplePredicate field="age" operator="isMissing"/>
      </Attribute>
      <Attribute partialScore="-3" reasonCode="RC2_2">
         <SimplePredicate field="age" operator="lessOrEqual" value="18"/>
      </Attribute>
      <Attribute partialScore="0" reasonCode="RC2_3">
         <CompoundPredicate booleanOperator="and">
            <SimplePredicate field="age" operator="greaterThan" value="18"/>
            <SimplePredicate field="age" operator="lessOrEqual" value="29"/>
         </CompoundPredicate>
      </Attribute>
      <Attribute partialScore="12" reasonCode="RC2_4">
         <CompoundPredicate booleanOperator="and">
            <SimplePredicate field="age" operator="greaterThan" value="29"/>
            <SimplePredicate field="age" operator="lessOrEqual" value="39"/>
         </CompoundPredicate>
      </Attribute> 
      <Attribute partialScore="18" reasonCode="RC2_5">
         <SimplePredicate field="age" operator="greaterThan" value="39"/>
      </Attribute>
  </Characteristic>
   ...
</Scorecard>

If the scorecard is intended to derive reason code calculations from the weighted average score of each characteristic (attribute baselineMethod equals to "mean"), the average partial score should be entered as the baselineScore. For example, given the following distribution of attributes for characteristic "age" (as obtained from the training data):

Distribution of input attributes for characteristic "age"
Attribute Partial Score Distribution
if value is missing -1 5%
0-18 -3 14%
19-29 0 22%
30-39 12 34%
40- 18 25%

The weighted average for characteristic "age" is 8.39 and the corresponding PMML code is:

<Scorecard modelName="SampleScorecard" functionName="regression" 
    useReasonCodes="true" reasonCodeAlgorithm="pointsBelow" 
    initialScore="0" baselineMethod="mean">
    ...
   <Characteristics>
      <Characteristic name="ageScore" baselineScore="8.39">
         <Attribute partialScore="-1" reasonCode="RC2_1">
            <SimplePredicate field="age" operator="isMissing"/>
         </Attribute>
         ...
     </Characteristic>
      ...
   </Characteristics>
</Scorecard>

Scoring Procedure

The scoring procedure for a scorecard is simple. Partial scores are summed up to create an overall score, the result of the scorecard. And so, for the PMML example shown above, if the input data record consists of ("engineering","25","500"), meaning department is engineering, age is 25 and income is 500, the overall score will be: 3 + 0 + 26 = 29.

In a scorecard, a single Attribute/PREDICATE per Characteristic should evaluate to TRUE. However, if more than one Attribute evaluates to TRUE, only the partial score associated with the first "true" Attribute is used to compute the overall score. The same rule applies to reason codes. On the other hand, if not even a single Attribute/PREDICATE evaluates to TRUE for a given Characteristic, the scorecard as a whole returns an invalid value.

Ranking Reason Codes

The ranking of reason codes can be calculated using differences either above or below the baselineScore of each characteristic. Differences below the baseline are typically used for scorecards where "higher is better", while differences above the baseline are used with scorecards where "lower is better".

To properly account for the possibility that individual reason codes can be cited by multiple characteristics, the following routines are recommended for ranking the reason codes:

  1. For each unique reason code, R_1, R_2, ..., R_n, initialize points missed P_i = 0, for each i=1, ... n.
  2. For each scorecard characteristic, C_1, C_2, ..., C_m, compute point differential d_j between the realized partial score and the characteristic's baselineScore. The direction of the difference is determined by the reasonCodeAlgorithm:
    pointsBelow d_j = baselineScore_j - partialScore_j
    pointsAbove d_j = partialScore_j - baselineScore_j
    Note that negative differences are possible and should be expected when the baselineMethod is other than min or max.
  3. Then, using the reason code corresponding to the scored Attribute (or for the whole Characteristic), find the appropriate index i, and add d_j to that P_i.
  4. Rank the total points, P_1, ... P_n from largest to smallest, and return the corresponding reason codes, R_1, ..., R_n in that same ordering.

In the PMML example above, reason codes would therefore be ranked in the following way: "RC2" would be the top reason code (with a difference of 18-0=18 points), followed by "RC1" (with 19-3=16 points). Note that the partial score associated with "RC3", is higher than the baseline score ("26 > 10") and so it is meaningless in explaining a possible adverse decision. Since only two partial scores are lower than their respective baselineScores and given that three reason codes are to be returned in the PMML example, the second and third reason codes would be populated with the same code: "RC1".

Finally, if the difference between partial and baseline scores is the same for competing reason codes, the reason code to be output first will be the one associated with the Attribute or Characteristic that appears first in the PMML file, from top to bottom.

See the chapter on Outputs for details on the various types of outputs that can be returned by Scorecards.

e-mail info at dmg.org