PMML 3.2 - Model Composition: Sequences of Models and Model Selection
Model Composition allows the combination of simple models into a single composite PMML model. There are two main variants:- Model sequencing: two or more models are combined into a sequence where the results of one model are used as input in another model.
- Model selection: one of many models can be selected based on decision rules.
Sample scenarios
Model sequencing and selection in PMML covers a variety of scenarios such as the following examples:-
A logistic regression model may require non-trivial rules
for replacing missing values like
if Age is missing
These preprocessing rules can be defined by a simple decision tree model that is put into a sequence with the regression model.
if Occupation is "Student" then Age := 20 else if Occupation is "Retired" then Age := 70 else Age := 40 -
A common method for optimizing prediction models is the combination
of segmentation and regression.
Data are grouped into segments and for each segment there
may be different regression equations.
If the segmentation can be expressed by decision rules then this kind of
segment based regression can be implemented by a decision tree
where any leaf node in the tree can contain an embedded regression model.
-
Prediction results may have to be combined with a cost or profit matrix
before a decision can be derived.
A mailing campaign model may use tree classification to determine
response probabilities per customer and channel.
The cost matrix can be appended as a regression model
that applies cost weighting factors to different
channels, e.g., high cost for phone and low cost for email.
The final decision is then based on the outcome of the
regression model.
-
A voting scheme that merges results from multiple models can
also be implemented by model composition in PMML.
For example, there may be an ensemble of four classification models A, B, C,
and D for the same target with values "yes" and "no".
The final classification result may be defined as the average of the
results from A, B, C, and D.
The average can be computed by a regression model with equations
pyes = 0.25*pAyes + 0.25*pByes + 0.25*pCyes + 0.25*pDyes
where pXyes stands for the probability of class "yes" and pXno stands for the probability of class "no" in the model X.
pno = 0.25*pAno + 0.25*pBno + 0.25*pCno + 0.25*pDno
XML Schema
Model composition uses three syntactical concepts
- The essential elements of a predictive model are captured
in elements that can be included in other models.
- Embedded models can define new fields, similar to derived fields.
- The leaf nodes in a decision tree can contain another predictive model.
For example, using a sequence of models,
a field could be defined by a regression equation. This field is then
used as an ordinary input field in a decision tree. The basic idea is that
we capture the essential elements of a model, in this example from a
regression model, and use them to define new fields.
That is similar to defining a derived field.
Mining models and their corresponding embedded elements
The first steps in making models reusable in other models is
the definition of 'model expression' elements
that can be embedded in another model.
PMML defines the two elements Regression and DecisionTree.
Standalone model element | Embedded model element | Main content |
---|---|---|
RegressionModel | Regression | RegressionTable(s) |
TreeModel | DecisionTree | Node(s) |
EmbeddedModel does not contain a MiningSchema. There is only one
MiningSchema at the top-level.
<xs:group name="EmbeddedModel">
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:choice>
<xs:element ref="Regression" />
<xs:element ref="DecisionTree" />
</xs:choice>
</xs:sequence>
</xs:group>
The element Regression contains the essential
elements of a RegressionModel:
<xs:element name="Regression">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="LocalTransformations" minOccurs="0" />
<xs:element ref="ResultField" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="RegressionTable" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="modelName" type="xs:string" />
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required" />
<xs:attribute name="algorithmName" type="xs:string" />
<xs:attribute name="normalizationMethod" type="REGRESSIONNORMALIZATIONMETHOD" default="none" />
</xs:complexType>
</xs:element>
The element DecisionTree contains the essential
elements of a TreeModel:
<xs:element name="DecisionTree">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="LocalTransformations" minOccurs="0" />
<xs:element ref="ResultField" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="Node" />
</xs:sequence>
<xs:attribute name="modelName" type="xs:string" />
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required" />
<xs:attribute name="algorithmName" type="xs:string" />
<xs:attribute name="missingValueStrategy" type="MISSING-VALUE-STRATEGY" default="none"/>
<xs:attribute name="missingValuePenalty" type="PROB-NUMBER" default="1.0"/>
<xs:attribute name="noTrueChildStrategy" type="NO-TRUE-CHILD-STRATEGY" default="returnNullPrediction" />
<xs:attribute name="splitCharacteristic" default="multiSplit">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="binarySplit" />
<xs:enumeration value="multiSplit" />
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:element>
Regression and DecisionTree can exclusively be used to build
a model using the MiningModel model type:
<xs:element name="MiningModel">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="MiningSchema"/>
<xs:element ref="Output" minOccurs="0"/>
<xs:element ref="ModelStats" minOccurs="0"/>
<xs:element ref="Targets" minOccurs="0"/>
<xs:element ref="LocalTransformations" minOccurs="0" />
<xs:choice maxOccurs="unbounded">
<xs:element ref="Regression"/>
<xs:element ref="DecisionTree"/>
</xs:choice>
<xs:element ref="ModelVerification" minOccurs="0"/>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="modelName" type="xs:string" use="optional"/>
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/>
<xs:attribute name="algorithmName" type="xs:string" use="optional"/>
</xs:complexType>
</xs:element>
The element ResultField is very similar to
OutputField
and DerivedField
<xs:element name="ResultField">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="name" type="FIELD-NAME" use="required" />
<xs:attribute name="displayName" type="xs:string" />
<xs:attribute name="optype" type="OPTYPE" />
<xs:attribute name="dataType" type="DATATYPE"/>
<xs:attribute name="feature" type="RESULT-FEATURE" />
<xs:attribute name="value" type="xs:string" />
</xs:complexType>
</xs:element>
Model Sequencing for Input Transformations
The following example demonstrates how a regression equation can be used to define an input transformation in another model which happens to be a TreeModel.<?xml version="1.0" ?> <PMML version="3.2" xmlns="https://www.dmg.org/PMML-3_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Header copyright="DMG.org"/> <DataDictionary numberOfFields="5"> <DataField name="age" optype="continuous" dataType="double"/> <DataField name="income" optype="continuous" dataType="double"/> <DataField name="gender" optype="categorical" dataType="string"> <Value value=" female"/> <Value value=" male"/> </DataField> <DataField name="weight" optype="continuous" dataType="double"/> </DataDictionary> <MiningModel functionName="regression"> <MiningSchema> <MiningField name="age"/> <MiningField name="income"/> <MiningField name="gender"/> <MiningField name="weight" usageType="predicted"/> </MiningSchema> <LocalTransformations> <DerivedField name="mc" optype="continuous" dataType="double"> <MapValues outputColumn="mapped" mapMissingTo="-1"> <FieldColumnPair field="gender" column="sourceval"/> <InlineTable> <row><sourceval> female</sourceval><mapped>1</mapped></row> <row><sourceval> male</sourceval><mapped>0</mapped></row> </InlineTable> </MapValues> </DerivedField> </LocalTransformations> <Regression functionName="regression"> <ResultField name="term" feature="predictedValue"/> <RegressionTable intercept="2.34"> <NumericPredictor name="income" coefficient="0.03"/> <PredictorTerm coefficient="1.23"> <FieldRef field="age"/> <FieldRef field="mc"/> </PredictorTerm> </RegressionTable> </Regression> <DecisionTree functionName="regression"> <Node score="0.0"> <True/> <Node score="32.32"> <SimplePredicate field="term" operator="lessThan" value="42"/> </Node> <Node score="78.91"> <SimplePredicate field="term" operator="greaterOrEqual" value="42"/> </Node> </Node> </DecisionTree> </MiningModel> </PMML> |
Remarks:
- The submodels comprising a sequence are ordered in such a way that each is defined after any other submodels on which it depends.
- The prediction from the last submodel defined is taken as the prediction for the composite model.
Model selection
Model selection in PMML allows for combining multiple 'embedded models', aka model expressions, into the decision logic that selects one of the models depending on the current input values.
The following example shows how regression elements are used within
the nodes of a decision tree:
<?xml version="1.0" ?>
<PMML version="3.2" xmlns="https://www.dmg.org/PMML-3_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header copyright="DMG.org"/>
<DataDictionary numberOfFields="5">
<DataField name="age" optype="continuous" dataType="double"/>
<DataField name="income" optype="continuous" dataType="double"/>
<DataField name="gender" optype="categorical" dataType="string">
<Value value=" female"/>
<Value value=" male"/>
</DataField>
<DataField name="weight" optype="continuous" dataType="double"/>
</DataDictionary>
<MiningModel functionName="regression">
<MiningSchema>
<MiningField name="age"/>
<MiningField name="income"/>
<MiningField name="gender"/>
<MiningField name="weight" usageType="predicted"/>
</MiningSchema>
<LocalTransformations>
<DerivedField name="mc" optype="continuous" dataType="double">
<MapValues outputColumn="mapped" mapMissingTo="-1">
<FieldColumnPair field="gender" column="sourceval"/>
<InlineTable>
<row><sourceval> female</sourceval><mapped>1</mapped></row>
<row><sourceval> male</sourceval><mapped>0</mapped></row>
</InlineTable>
</MapValues>
</DerivedField>
</LocalTransformations>
<DecisionTree functionName="regression">
<Node score="0.0">
<True/>
<Node score="0.0">
<SimplePredicate field="age" operator="lessOrEqual" value="50"/>
<Regression functionName="regression">
<RegressionTable intercept="0.0">
<NumericPredictor name="income" coefficient="0.03"/>
<PredictorTerm coefficient="1.23">
<FieldRef field="age"/>
<FieldRef field="mc"/>
</PredictorTerm>
</RegressionTable>
</Regression>
</Node>
<Node score="0.0">
<SimplePredicate field="age" operator="greaterThan" value="50"/>
<Regression functionName="regression">
<RegressionTable intercept="2.22">
<NumericPredictor name="income" coefficient="0.01"/>
<PredictorTerm coefficient="-0.11">
<FieldRef field="age"/>
<FieldRef field="mc"/>
</PredictorTerm>
</RegressionTable>
</Regression>
</Node>
</Node>
</DecisionTree>
</MiningModel>
</PMML>