PMML 4.3 - RuleSet
 PMML4.3 Menu Home Changes XML Schema Conformance Interoperability General Structure Field Scope Header Data Dictionary Mining Schema Transformations Statistics Taxomony Targets Output Functions Built-in Functions Model Verification Model Explanation Multiple Models Association Rules Baseline Models Bayesian Network Cluster Models Gaussian Process General Regression k-Nearest Neighbors Naive Bayes Neural Network Regression Ruleset Scorecard Sequences Text Models Time Series Trees Vector Machine

## PMML 4.3 - RuleSet

Ruleset models can be thought of as flattened decision tree models. A ruleset consists of a number of rules. Each rule contains a predicate and a predicted class value, plus some information collected at training or testing time on the performance of the rule.

For example, the following text describes a rule:

```PREDICATE: BP="HIGH" AND K > 0.045804001 AND Age <= 50 AND Na <= 0.77240998
PREDICTION: "drugB"
CONFIDENCE: 0.9
```

Rulesets can be applied to new instances to derive predictions and associated confidences (scoring). Considering a case to be scored, if the rule's predicate evaluates to TRUE on the instance, the rule is said to fire. The ruleset can also have an optional default prediction and associated confidence that can be used to score a case if no rules fire.

If missing values in fields mentioned in a rule's predicate cause the predicate to evaluate to UNKNOWN, the rule does not fire.

One important question is then how to resolve conflicting predictions when multiple rules "fire". Useful strategies include:

• first hit (just pick the first rule that fires).

• weighted maximum (pick the rule with the highest weight).

• weighted sum (pick the best prediction by combining the weights of all firing rules).

Each rule can have a confidence and a weight that are set at model build time, likely by considering each rule's performance on the training data. The method used to compute confidence and weight is employed by the application authoring the PMML and lies outside the scope of the PMML model description.

```<xs:element name="RuleSetModel">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="MiningSchema"/>
<xs:element ref="Output" minOccurs="0"/>
<xs:element ref="ModelStats" minOccurs="0"/>
<xs:element ref="ModelExplanation" minOccurs="0"/>
<xs:element ref="Targets" minOccurs="0"/>
<xs:element ref="LocalTransformations" minOccurs="0"/>
<xs:element ref="RuleSet"/>
<xs:element ref="ModelVerification" minOccurs="0"/>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="modelName" type="xs:string" use="optional"/>
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/>
<xs:attribute name="algorithmName" type="xs:string" use="optional"/>
<xs:attribute name="isScorable" type="xs:boolean" default="true"/>
</xs:complexType>
</xs:element>
```

Definitions:

RuleSetModel: starts the definition for a ruleset model.
RuleSet: this element describes a list of rules that make up a ruleset model. The order of rules in the list is important when considering how to score the ruleset.
modelName: the value in modelName in a RuleSetModel element identifies the model with an unique name in the context of the PMML file. See General Structure of PMML models.
isScorable: This attribute indicates if the model is valid for scoring. If this attribute is true or if it is missing, then the model should be processed normally. However, if the attribute is false, then the model producer has indicated that this model is intended for information purposes only and should not be used to generate results. In order to be valid PMML, all required elements and attributes must be present, even for non-scoring models. For more details, see General Structure.

A RuleSet consists of:

```<xs:element name="RuleSet">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="RuleSelectionMethod" minOccurs="1" maxOccurs="unbounded"/>
<xs:element ref="ScoreDistribution" minOccurs="0" maxOccurs="unbounded"/>
<xs:group ref="Rule" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="recordCount" type="NUMBER" use="optional"/>
<xs:attribute name="nbCorrect" type="NUMBER" use="optional"/>
<xs:attribute name="defaultScore" type="xs:string" use="optional"/>
<xs:attribute name="defaultConfidence" type="NUMBER" use="optional"/>
</xs:complexType>
</xs:element>
```

Definitions

recordCount: The number of training/test cases to which the ruleset was applied to generate support and confidence measures for individual rules.
nbCorrect: indicates the number of training/test instances for which the default score is correct.
defaultScore: The value of score in a RuleSet serves as the default predicted value when scoring a case no rules in the ruleset fire.
defaultConfidence: provides a confidence to be returned with the default score (when scoring a case and no rules in the ruleset fire).

RuleSelectionMethod: specifies how to select rules from the ruleset to score a new case. If more than one method is included, the first method is used as the default method for scoring, but the other methods included may be selected by the application wishing to perform scoring as valid alternative methods.
ScoreDistribution: describe the distribution of the predicted value in the test/training data.
Rule: contains 0 or more rules which comprise the ruleset.

The RuleSelectionMethod describes how rules are selected to apply the model to a new case, and consists of:

```<xs:element name="RuleSelectionMethod">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="criterion" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="weightedSum"/>
<xs:enumeration value="weightedMax"/>
<xs:enumeration value="firstHit"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:element>
```

Definitions:

criterion explains how to determine and rank predictions and their associated confidences from the ruleset in case multiple rules fire. There are many many possible ways of applying rulesets, but three useful approaches are covered.
• firstHit: First firing rule is chosen as the predicted class, and the confidence is the confidence of that rule. If further predictions and confidences are required, a search for the next firing rule that chooses a different predicted class is made, and so on.
• weightedSum: Calculate the total weight for each class by summing the weights for each firing rule which predicts that class. The prediction with the highest total weight is then selected. The confidence is the total confidence of the winning class divided by the number of firing rules. If further predictions and confidences are required, the process is repeated to find the class with the second highest total weight, and so on. Note that if two or more classes are assigned the same weight, the winning class is the one that appears first in the data dictionary values.
• weightedMax: Select the firing rule with the highest weight. The confidence returned is the confidence of the selected rule. Note that if two firing rules have the same weight, the rule that occurs first in the ruleset is chosen.

Each Rule can be either a SimpleRule or a CompoundRule.

```<xs:group name="Rule">
<xs:choice>
<xs:element ref="SimpleRule"/>
<xs:element ref="CompoundRule"/>
</xs:choice>
</xs:group>
```

Each SimpleRule consists of an identifier, a predicate, a score and information on rule performance.

```<xs:element name="SimpleRule">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:group ref="PREDICATE"/>
<xs:element ref="ScoreDistribution" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="id" type="xs:string" use="optional"/>
<xs:attribute name="score" type="xs:string" use="required"/>
<xs:attribute name="recordCount" type="NUMBER" use="optional"/>
<xs:attribute name="nbCorrect" type="NUMBER" use="optional"/>
<xs:attribute name="confidence" type="NUMBER" use="optional" default="1"/>
<xs:attribute name="weight" type="NUMBER" use="optional" default="1"/>
</xs:complexType>
</xs:element>
```

Definitions:

PREDICATE: the condition upon which the rule fires. For more details on PREDICATE see the section on predicates in TreeModel. This explains how predicates are described and evaluated and how missing values are handled.

ScoreDistribution: Describes the distribution of the predicted value for instances where the rule fires in the training/test data.

id: The value of id serves as a unique identifier for the rule. Must be unique within the ruleset.

score: The predicted value when the rule fires.

recordCount: The number of training/test instances on which the rule fired.

nbCorrect: Indicates the number of training/test instances on which the rule fired and the prediction was correct.

confidence: Indicates the confidence of the rule.

weight: Indicates the relative importance of the rule. May or may not be equal to the confidence.

Each CompoundRule consists of a predicate and one or more rules. CompoundRules offer a shorthand for a more compact representation of rulesets and suggest a more efficient execution mechanism.

```<xs:element name="CompoundRule">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:group ref="PREDICATE"/>
<xs:group ref="Rule" minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
</xs:element>
```

Definitions:

PREDICATE: For more details on PREDICATE see TreeModel.

Rule: One or more rules that are contained within the CompoundRule. Each of these rules may be a SimpleRule or a CompoundRule.

A ruleset containing both compound rules and simple rules have the same meaning as an equivalent ruleset containing only simple rules. It is possible to derive a ruleset containing simple rules by repeating the following transformation:

The original rule
```<CompoundRule>
<PREDICATE1/>
<SimpleRule id="1" ...>
<PREDICATE2/>
... contents of simple rule 1 ...
</SimpleRule>
... further rules ...
</CompoundRule>
```

transforms to

```<SimpleRule id="1" ...>
<CompoundPredicate booleanOperator="and">
<PREDICATE1>
<PREDICATE2>
</CompoundPredicate>
... contents of simple rule 1 ...
</SimpleRule>
<CompoundRule>
<PREDICATE1/>
... further rules ...
</CompoundRule>
```

Or in other words, a simple rule is said to fire if its predicate evaluates to TRUE, and the predicates of all compound rules that contain the simple rule also evaluate to TRUE.

### A Complete RuleSet Example

Consider a ruleset with three rules:

```RULE1:
PREDICATE: BP="HIGH" AND K > 0.045804001 AND Age <= 50 AND Na <= 0.77240998
PREDICTION: drugB
Training/test measures:
recordCount     79
nbCorrect       76
confidence      0.9
weight          0.9
RULE2:
PREDICATE: K > 0.057789002 AND BP="HIGH" AND Age <= 50
PREDICTION: drugA
Training/test measures:
recordCount     278
nbCorrect       168
confidence      0.6
weight          0.6
RULE3:
PREDICATE: BP="HIGH" AND Na > 0.21
PREDICTION: drugA
Training/test measures:
recordCount     100
nbCorrect       50
confidence      0.36
weight          0.36
```

PMML for the example (using only simple rules)

```<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">
<Application name="MyApplication" version="1.0"/>
<DataField name="BP" displayName="BP" optype="categorical" dataType="string">
<Value value="HIGH" property="valid"/>
<Value value="LOW" property="valid"/>
<Value value="NORMAL" property="valid"/>
</DataField>
<DataField name="K" displayName="K" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="0.020152" rightMargin="0.079925"/>
</DataField>
<DataField name="Age" displayName="Age" optype="continuous" dataType="integer"/>
<DataField name="Na" displayName="Na" optype="continuous" dataType="double"/>
<DataField name="Cholesterol" displayName="Cholesterol" optype="categorical" dataType="string">
<Value value="HIGH" property="valid"/>
<Value value="NORMAL" property="valid"/>
</DataField>
<DataField name="\$C-Drug" displayName="\$C-Drug" optype="categorical" dataType="string">
<Value value="drugA" property="valid"/>
<Value value="drugB" property="valid"/>
<Value value="drugC" property="valid"/>
<Value value="drugX" property="valid"/>
<Value value="drugY" property="valid"/>
</DataField>
<DataField name="\$CC-Drug" displayName="\$CC-Drug" optype="continuous" dataType="double"/>
<RuleSetModel modelName="NestedDrug" functionName="classification" algorithmName="RuleSet">
<MiningSchema>
<MiningField name="BP" usageType="active"/>
<MiningField name="K" usageType="active"/>
<MiningField name="Age" usageType="active"/>
<MiningField name="Na" usageType="active"/>
<MiningField name="Cholesterol" usageType="active"/>
<MiningField name="\$C-Drug" usageType="target"/>
<MiningField name="\$CC-Drug" usageType="supplementary"/>
</MiningSchema>
<RuleSet defaultScore="drugY" recordCount="1000" nbCorrect="149" defaultConfidence="0.0">
<RuleSelectionMethod criterion="weightedSum"/>
<RuleSelectionMethod criterion="weightedMax"/>
<RuleSelectionMethod criterion="firstHit"/>
<SimpleRule id="RULE1" score="drugB" recordCount="79" nbCorrect="76" confidence="0.9" weight="0.9">
<CompoundPredicate booleanOperator="and">
<SimplePredicate field="BP" operator="equal" value="HIGH"/>
<SimplePredicate field="K" operator="greaterThan" value="0.045804001"/>
<SimplePredicate field="Age" operator="lessOrEqual" value="50"/>
<SimplePredicate field="Na" operator="lessOrEqual" value="0.77240998"/>
</CompoundPredicate>
<ScoreDistribution value="drugA" recordCount="2"/>
<ScoreDistribution value="drugB" recordCount="76"/>
<ScoreDistribution value="drugC" recordCount="1"/>
<ScoreDistribution value="drugX" recordCount="0"/>
<ScoreDistribution value="drugY" recordCount="0"/>
</SimpleRule>
<SimpleRule id="RULE2" score="drugA" recordCount="278" nbCorrect="168" confidence="0.6" weight="0.6">
<CompoundPredicate booleanOperator="and">
<SimplePredicate field="K" operator="greaterThan" value="0.057789002"/>
<SimplePredicate field="BP" operator="equal" value="HIGH"/>
<SimplePredicate field="Age" operator="lessOrEqual" value="50"/>
</CompoundPredicate>
<ScoreDistribution value="drugA" recordCount="168"/>
<ScoreDistribution value="drugB" recordCount="40"/>
<ScoreDistribution value="drugC" recordCount="12"/>
<ScoreDistribution value="drugX" recordCount="14"/>
<ScoreDistribution value="drugY" recordCount="24"/>
</SimpleRule>
<SimpleRule id="RULE3" score="drugA" recordCount="100" nbCorrect="50" confidence="0.36" weight="0.36">
<CompoundPredicate booleanOperator="and">
<SimplePredicate field="BP" operator="equal" value="HIGH"/>
<SimplePredicate field="Na" operator="greaterThan" value="0.21"/>
</CompoundPredicate>
<ScoreDistribution value="drugA" recordCount="50"/>
<ScoreDistribution value="drugB" recordCount="10"/>
<ScoreDistribution value="drugC" recordCount="12"/>
<ScoreDistribution value="drugX" recordCount="7"/>
<ScoreDistribution value="drugY" recordCount="11"/>
</SimpleRule>
</RuleSet>
</RuleSetModel>
</PMML>
```

Scoring Procedure for the Example

We will use the above example to illustrate the steps that should be followed in the scoring process.

Suppose we wish to score an instance where:

BP="HIGH", K=0.0621, Age = 36, Na = 0.5023

criterion="firstHit" scoring
If the criterion attribute in the RuleSelectionMethod is set to "firstHit" then RULE1 "fires" first and the prediction is "drugB". The confidence is the weight of RULE1, 0.9.

criterion="weightedSum" scoring
RULE1 RULE2 and RULE3 all fire. To choose the winner, for each prediction, sum the weights of the firing rules to produce a total weight for that prediction.
drugA: total weight = weight(RULE2) + weight(RULE3) = 0.6 + 0.36 = 0.96
drugB: total weight = weight(RULE1) = 0.9

The winning prediction with the highest total weight is drugA. The confidence for this prediction is the total weight for firing rules that predict drugA divided by the number of rules that fired:

confidence(drugA) = total_weight(drugA) / number_of_firing_rules = 0.96 / 3 = 0.32

criterion="weightedMax" scoring
RULE1 has the highest weight of the firing rules and the prediction is "drugB". The confidence is the confidence of RULE1, 0.9.

PMML for the example (using compound rules)

The following PMML shows how the example model can be described using compound rules.

```<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">
<Application name="MyApplication" version="1.0"/>
<DataField name="BP" displayName="BP" optype="categorical" dataType="string">
<Value value="HIGH" property="valid"/>
<Value value="LOW" property="valid"/>
<Value value="NORMAL" property="valid"/>
</DataField>
<DataField name="K" displayName="K" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="0.020152" rightMargin="0.079925"/>
</DataField>
<DataField name="Age" displayName="Age" optype="continuous" dataType="integer">
<Interval closure="closedClosed" leftMargin="15" rightMargin="74"/>
</DataField>
<DataField name="Na" displayName="Na" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="0.500517" rightMargin="0.899774"/>
</DataField>
<DataField name="Cholesterol" displayName="Cholesterol" optype="categorical" dataType="string">
<Value value="HIGH" property="valid"/>
<Value value="NORMAL" property="valid"/>
</DataField>
<DataField name="\$C-Drug" displayName="\$C-Drug" optype="categorical" dataType="string">
<Value value="drugA" property="valid"/>
<Value value="drugB" property="valid"/>
<Value value="drugC" property="valid"/>
<Value value="drugX" property="valid"/>
<Value value="drugY" property="valid"/>
</DataField>
<DataField name="\$CC-Drug" displayName="\$CC-Drug" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="0" rightMargin="1"/>
</DataField>
<RuleSetModel modelName="Drug" functionName="classification" algorithmName="RuleSet">
<MiningSchema>
<MiningField name="BP" usageType="active"/>
<MiningField name="K" usageType="active"/>
<MiningField name="Age" usageType="active"/>
<MiningField name="Na" usageType="active"/>
<MiningField name="Cholesterol" usageType="active"/>
<MiningField name="\$C-Drug" usageType="target"/>
<MiningField name="\$CC-Drug" usageType="supplementary"/>
</MiningSchema>
<RuleSet defaultScore="drugY" recordCount="1000" nbCorrect="149" defaultConfidence="0.0">
<RuleSelectionMethod criterion="weightedSum"/>
<RuleSelectionMethod criterion="weightedMax"/>
<RuleSelectionMethod criterion="firstHit"/>
<CompoundRule>
<SimplePredicate field="BP" operator="equal" value="HIGH"/>
<CompoundRule>
<SimplePredicate field="Age" operator="lessOrEqual" value="50"/>
<SimpleRule id="RULE1" score="drugB" recordCount="79" nbCorrect="76" confidence="0.9" weight="0.9">
<CompoundPredicate booleanOperator="and">
<SimplePredicate field="K" operator="greaterThan" value="0.045804001"/>
<SimplePredicate field="Na" operator="lessOrEqual" value="0.77240998"/>
</CompoundPredicate>
<ScoreDistribution value="drugA" recordCount="2"/>
<ScoreDistribution value="drugB" recordCount="76"/>
<ScoreDistribution value="drugC" recordCount="1"/>
<ScoreDistribution value="drugX" recordCount="0"/>
<ScoreDistribution value="drugY" recordCount="0"/>
</SimpleRule>
<SimpleRule id="RULE2" score="drugA" recordCount="278" nbCorrect="168" confidence="0.6" weight="0.6">
<SimplePredicate field="K" operator="greaterThan" value="0.057789002"/>
<ScoreDistribution value="drugA" recordCount="168"/>
<ScoreDistribution value="drugB" recordCount="40"/>
<ScoreDistribution value="drugC" recordCount="12"/>
<ScoreDistribution value="drugX" recordCount="14"/>
<ScoreDistribution value="drugY" recordCount="24"/>
</SimpleRule>
</CompoundRule>
<SimpleRule id="RULE3" score="drugA" recordCount="100" nbCorrect="50" confidence="0.36" weight="0.36">
<SimplePredicate field="Na" operator="greaterThan" value="0.21"/>
<ScoreDistribution value="drugA" recordCount="50"/>
<ScoreDistribution value="drugB" recordCount="10"/>
<ScoreDistribution value="drugC" recordCount="12"/>
<ScoreDistribution value="drugX" recordCount="7"/>
<ScoreDistribution value="drugY" recordCount="11"/>
</SimpleRule>
</CompoundRule>
</RuleSet>
</RuleSetModel>
</PMML>
```
 e-mail info at dmg.org