|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
PMML 4.2 - ScorecardA data mining model contains a set of input fields which are used to predict a certain target value. This prediction can be seen as an assessment about a prospect, a customer, or a scenario for which an outcome is predicted based on historical data. In a scorecard, input fields, also referred to as characteristics (for example, "age"), are broken down into attributes (for example, "19-29" and "30-39" age groups or ranges) with specific partial scores associated with them. These scores represent the influence of the input attributes on the target and are readily available for inspection. Partial scores are then summed up so that an overall score can be obtained for the target value. Scorecards are very popular in the financial industry for their interpretability and ease of implementation, and because input attributes can be mapped to a series of reason codes which provide explanations of each individual's score. Usually, the lower the overall score produced by a scorecard, the higher the chances of it triggering an adverse decision, which usually involves the referral or denial of services. Reason codes, as the name suggests, allow for an explanation of scorecard behavior and any adverse decisions generated as a consequence of the overall score. They basically answer the question: "Why is the score low, given its input conditions?" (For inverted scoring ranges, this specification also provides for the option of returning reason codes for scores which are "too high". See section Scoring Procedure.) The XML Schema for Scorecard<xs:element name="Scorecard"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MiningSchema"/> <xs:element ref="Output" minOccurs="0"/> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="ModelExplanation" minOccurs="0"/> <xs:element ref="Targets" minOccurs="0"/> <xs:element ref="LocalTransformations" minOccurs="0"/> <xs:element ref="Characteristics"/> <xs:element ref="ModelVerification" minOccurs="0"/> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/> <xs:attribute name="algorithmName" type="xs:string"/> <xs:attribute name="initialScore" type="NUMBER" default="0"/> <xs:attribute name="useReasonCodes" type="xs:boolean" default="true"/> <xs:attribute name="reasonCodeAlgorithm" default="pointsBelow"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="pointsAbove"/> <xs:enumeration value="pointsBelow"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="baselineScore" type="NUMBER"/> <xs:attribute name="baselineMethod" default="other"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="max"/> <xs:enumeration value="min"/> <xs:enumeration value="mean"/> <xs:enumeration value="neutral"/> <xs:enumeration value="other"/> </xs:restriction> </xs:simpleType> </xs:attribute> <xs:attribute name="isScorable" type="xs:boolean" default="true"/> </xs:complexType> </xs:element> <xs:element name="Characteristics"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Characteristic" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType> </xs:element> Definitions
Note that if useReasonCodes is "true", then baselineScore must be defined at the Scorecard level or for each Characteristic, and reasonCode must be provided for each Characteristic or for each of its input Attributes. If useReasonCodes is "false", then baselineScore and reasonCode are not required. ExampleThe following sample scorecard is used to compute the overall score associated with three input characteristics: "department", "age", and "income". Partial scores for categorical characteristic "department" Partial scores for numeric characteristic "age" Partial scores and reason codes for numeric characteristic "income" The corresponding PMML model is: <PMML xmlns="https://www.dmg.org/PMML-4_2" version="4.2"> <Header copyright="www.dmg.org" description="Sample scorecard"> <Timestamp>2010-11-10T08:17:10.8</Timestamp> </Header> <DataDictionary> <DataField name="department" dataType="string" optype="categorical"/> <DataField name="age" dataType="integer" optype="continuous"/> <DataField name="income" dataType="double" optype="continuous"/> <DataField name="overallScore" dataType="double" optype="continuous"/> </DataDictionary> <Scorecard modelName="SampleScorecard" functionName="regression" useReasonCodes="true" reasonCodeAlgorithm="pointsBelow" initialScore="0" baselineMethod="other"> <MiningSchema> <MiningField name="department" usageType="active" invalidValueTreatment="asMissing"/> <MiningField name="age" usageType="active" invalidValueTreatment="asMissing"/> <MiningField name="income" usageType="active" invalidValueTreatment="asMissing"/> <MiningField name="overallScore" usageType="target"/> </MiningSchema> <Output> <OutputField name="Final Score" feature="predictedValue" dataType="double" optype="continuous"/> <OutputField name="Reason Code 1" rank="1" feature="reasonCode" dataType="string" optype="categorical"/> <OutputField name="Reason Code 2" rank="2" feature="reasonCode" dataType="string" optype="categorical"/> <OutputField name="Reason Code 3" rank="3" feature="reasonCode" dataType="string" optype="categorical"/> </Output> <Characteristics> <Characteristic name="departmentScore" reasonCode="RC1" baselineScore="19"> <Attribute partialScore="-9"> <SimplePredicate field="department" operator="isMissing"/> </Attribute> <Attribute partialScore="19"> <SimplePredicate field="department" operator="equal" value="marketing"/> </Attribute> <Attribute partialScore="3"> <SimplePredicate field="department" operator="equal" value="engineering"/> </Attribute> <Attribute partialScore="6"> <SimplePredicate field="department" operator="equal" value="business"/> </Attribute> <Attribute partialScore="0"> <True/> </Attribute> </Characteristic> <Characteristic name="ageScore" reasonCode="RC2" baselineScore="18"> <Attribute partialScore="-1"> <SimplePredicate field="age" operator="isMissing"/> </Attribute> <Attribute partialScore="-3"> <SimplePredicate field="age" operator="lessOrEqual" value="18"/> </Attribute> <Attribute partialScore="0"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="age" operator="greaterThan" value="18"/> <SimplePredicate field="age" operator="lessOrEqual" value="29"/> </CompoundPredicate> </Attribute> <Attribute partialScore="12"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="age" operator="greaterThan" value="29"/> <SimplePredicate field="age" operator="lessOrEqual" value="39"/> </CompoundPredicate> </Attribute> <Attribute partialScore="18"> <SimplePredicate field="age" operator="greaterThan" value="39"/> </Attribute> </Characteristic> <Characteristic name="incomeScore" reasonCode="RC3" baselineScore="10"> <Attribute partialScore="3"> <SimplePredicate field="income" operator="isMissing"/> </Attribute> <Attribute partialScore="26"> <SimplePredicate field="income" operator="lessOrEqual" value="1000"/> </Attribute> <Attribute> <SimplePredicate field="income" operator="lessOrEqual" value="1000"/> <ComplexPartialScore> <Apply function="+"> <Apply function="*"> <Constant>0.03</Constant> <FieldRef field="income"/> </Apply> <Constant>11</Constant> </Apply> </ComplexPartialScore> </Attribute> <Attribute partialScore="5"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="income" operator="greaterThan" value="1000"/> <SimplePredicate field="income" operator="lessOrEqual" value="2500"/> </CompoundPredicate> </Attribute> <Attribute> <SimplePredicate field="income" operator="greaterThan" value="1500"/> <ComplexPartialScore> <Apply function="-"> <Apply function="*"> <Constant>0.01</Constant> <FieldRef field="income"/> </Apply> <Constant>18</Constant> </Apply> </ComplexPartialScore> </Attribute> </Characteristic> </Characteristics> </Scorecard> </PMML> Note that Characteristic "departmentScore" encapsulates an element Attribute which is set to TRUE at all times. When used as part of the last Attribute in the list of attributes for a certain characteristic, PREDICATE True allows for a default partial score to be used in case no other Attribute/PREDICATE for that characteristic evaluates to TRUE. In addition, note that for this scorecard, a baselineScore is associated with each Characteristic. Also note that a single reasonCode is associated with each Characteristic. While reason code "RC1" is associated with Characteristic "departmentScore", reason codes "RC2" and RC3" are associated with numeric characteristics "ageScore" and "incomeScore", respectively. However, if the scorecard requires specific reason codes to be used per range or category of a characteristic, attribute reasonCode of element Attribute should be used instead. In this case, it takes precedence over attribute reasonCode of element Characteristic. For example, the table below shows a different reason code ("RC2_1" through "RC2_5") for each age range. Attribute-based reason codes for characteristic "age" In order to represent such a requirement, specific reason codes per Attribute element are now defined for Characteristic "ageScore" as show in the PMML code below. <Scorecard modelName="SampleScorecard" functionName="regression" useReasonCodes="true" reasonCodeAlgorithm="pointsBelow" initialScore="0" baselineMethod="other"> ... <Characteristic name="ageScore" baselineScore="18"> <Attribute partialScore="-1" reasonCode="RC2_1"> <SimplePredicate field="age" operator="isMissing"/> </Attribute> <Attribute partialScore="-3" reasonCode="RC2_2"> <SimplePredicate field="age" operator="lessOrEqual" value="18"/> </Attribute> <Attribute partialScore="0" reasonCode="RC2_3"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="age" operator="greaterThan" value="18"/> <SimplePredicate field="age" operator="lessOrEqual" value="29"/> </CompoundPredicate> </Attribute> <Attribute partialScore="12" reasonCode="RC2_4"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="age" operator="greaterThan" value="29"/> <SimplePredicate field="age" operator="lessOrEqual" value="39"/> </CompoundPredicate> </Attribute> <Attribute partialScore="18" reasonCode="RC2_5"> <SimplePredicate field="age" operator="greaterThan" value="39"/> </Attribute> </Characteristic> ... </Scorecard> If the scorecard is intended to derive reason code calculations from the weighted average score of each characteristic (attribute baselineMethod equals to "mean"), the average partial score should be entered as the baselineScore. For example, given the following distribution of attributes for characteristic "age" (as obtained from the training data): Distribution of input attributes for characteristic "age" The weighted average for characteristic "age" is 8.39 and the corresponding PMML code is: <Scorecard modelName="SampleScorecard" functionName="regression" useReasonCodes="true" reasonCodeAlgorithm="pointsBelow" initialScore="0" baselineMethod="mean"> ... <Characteristics> <Characteristic name="ageScore" baselineScore="8.39"> <Attribute partialScore="-1" reasonCode="RC2_1"> <SimplePredicate field="age" operator="isMissing"/> </Attribute> ... </Characteristic> ... </Characteristics> </Scorecard> Scoring ProcedureThe scoring procedure for a scorecard is simple. Partial scores are summed up to create an overall score, the result of the scorecard. And so, for the PMML example shown above, if the input data record consists of ("engineering","25","500"), meaning department is engineering, age is 25 and income is 500, the overall score will be: 3 + 0 + 26 = 29. In a scorecard, a single Attribute/PREDICATE per Characteristic should evaluate to TRUE. However, if more than one Attribute evaluates to TRUE, only the partial score associated with the first "true" Attribute is used to compute the overall score. The same rule applies to reason codes. On the other hand, if not even a single Attribute/PREDICATE evaluates to TRUE for a given Characteristic, the scorecard as a whole returns an invalid value. Ranking Reason CodesThe ranking of reason codes can be calculated using differences either above or below the baselineScore of each characteristic. Differences below the baseline are typically used for scorecards where "higher is better", while differences above the baseline are used with scorecards where "lower is better". To properly account for the possibility that individual reason codes can be cited by multiple characteristics, the following routines are recommended for ranking the reason codes:
In the PMML example above, reason codes would therefore be ranked in the following way: "RC2" would be the top reason code (with a difference of 18-0=18 points), followed by "RC1" (with 19-3=16 points). Note that the partial score associated with "RC3", is higher than the baseline score ("26 > 10") and so it is meaningless in explaining a possible adverse decision. Since only two partial scores are lower than their respective baselineScores and given that three reason codes are to be returned in the PMML example, the second and third reason codes would be populated with the same code: "RC1". Finally, if the difference between partial and baseline scores is the same for competing reason codes, the reason code to be output first will be the one associated with the Attribute or Characteristic that appears first in the PMML file, from top to bottom. See the chapter on Outputs for details on the various types of outputs that can be returned by Scorecards. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|