|
||||||||||||||
|
||||||||||||||
| ||||||||||||||
PMML 4.4.1 - Multiple Models: Model Composition, Ensembles, and SegmentationThe PMML standard provides several ways to represent multiple models within one PMML file. The simplest way is to put several models in one PMML element, but then it is not clear how the models should be used. The element MiningModel allows precise specification of the usage of multiple models within one PMML file. The two main approaches are Model Composition, and Segmentation. Model Composition includes model sequencing and model selection but is only applicable to Tree and Regression models. Segmentation allows representation of different models for different data segments and also can be used for model ensembles and model sequences. Scoring a case using a model ensemble consists of scoring it using each model separately, then combining the results into a single scoring result using one of the pre-defined combination methods. Scoring a case using a sequence, or chain, of models allows the output of one model to be passed in as input to subsequent models. ModelComposition uses "embedded model elements" that are defeatured copies of "standalone model elements" -- specifically, Regression for RegressionModel, DecisionTree for TreeModel. Besides being limited to Regression and Tree models, these embedded model elements lack key features like a MiningSchema (essential to manage scope across multiple model elements). Therefore, in PMML 4.2, the Model Composition approach has been deprecated since the Segmentation approach allows for a wider range of models to be used more reliably. For more on deprecation, see Conformance. Segmentation is accomplished by using any PMML model element inside of a Segment element, which also contains a PREDICATE and an optional weight. MiningModel then contains Segmentation element with a number of Segment elements as well as the attribute multipleModelMethod specifying how all the models applicable to a record should be combined. It is also possible to use a combination of model composition and segmentation approaches, using simple regression or decision trees for data preprocessing before segmentation. Sample scenariosTreatment of multiple models in PMML covers a variety of scenarios such as the following examples:
XML SchemaAll variations on support for multiple models rely on the MiningModel model type: <xs:element name="MiningModel"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MiningSchema"/> <xs:element ref="Output" minOccurs="0"/> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="ModelExplanation" minOccurs="0"/> <xs:element ref="Targets" minOccurs="0"/> <xs:element ref="LocalTransformations" minOccurs="0"/> <xs:choice minOccurs="0" maxOccurs="unbounded"> <xs:element ref="Regression"/> <xs:element ref="DecisionTree"/> </xs:choice> <xs:element ref="Segmentation" minOccurs="0"/> <xs:element ref="ModelVerification" minOccurs="0"/> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string" use="optional"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/> <xs:attribute name="algorithmName" type="xs:string" use="optional"/> <xs:attribute name="isScorable" type="xs:boolean" default="true"/> </xs:complexType> </xs:element> The isScorable attribute indicates whether the model is valid for scoring. If this attribute is true or if it is missing, then the model should be processed normally. However, if the attribute is false, then the model producer has indicated that this model is intended for information purposes only and should not be used to generate results. In order to be valid PMML, all required elements and attributes must be present, even for non-scoring models. For more details, see General Structure. A Segmentation element contains several Segments and a model combination method. Each Segment includes a PREDICATE element specifying the conditions under which that segment is to be used. For more details on PREDICATE see the section on predicates in TreeModel. It explains how predicates are described and evaluated and how missing values are handled. <xs:element name="Segmentation"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Segment" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="multipleModelMethod" type="MULTIPLE-MODEL-METHOD" use="required"/> <xs:attribute name="missingPredictionTreatment" type="MISSING-PREDICTION-TREATMENT" default="continue"/> <xs:attribute name="missingThreshold" type="PROB-NUMBER" default="1"/> </xs:complexType> </xs:element> <xs:element name="Segment"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="PREDICATE"/> <xs:group ref="MODEL-ELEMENT"/> <xs:element ref="VariableWeight" minOccurs="0" maxOccurs="1"/> </xs:sequence> <xs:attribute name="id" type="xs:string" use="optional"/> <xs:attribute name="weight" type="NUMBER" use="optional" default="1"/> </xs:complexType> </xs:element> The Segment element is used to tag each model that can be combined as part of an ensemble or associated with a population segment. A multiple model combination method must be specified using multipleModelMethod attribute in Segmentation element. <xs:simpleType name="MULTIPLE-MODEL-METHOD"> <xs:restriction base="xs:string"> <xs:enumeration value="majorityVote"/> <xs:enumeration value="weightedMajorityVote"/> <xs:enumeration value="average"/> <xs:enumeration value="weightedAverage"/> <xs:enumeration value="median"/> <xs:enumeration value="weightedMedian"/> <xs:enumeration value="max"/> <xs:enumeration value="sum"/> <xs:enumeration value="weightedSum"/> <xs:enumeration value="selectFirst"/> <xs:enumeration value="selectAll"/> <xs:enumeration value="modelChain"/> </xs:restriction> </xs:simpleType> With the exception of modelChain models, all model elements used inside Segment elements in one MiningModel must have the same MINING-FUNCTION. For modelChain models, the MINING-FUNCTION of last Segment executed (i.e., the last Segment with a Predicate that evaluates to true) must match the MINING-FUNCTION of the parent MiningModel; otherwise, by convention, the result is invalid. Note that weightedMedian is defined as follows (as found in the link https://en.wikipedia.org/wiki/Weighted_median): For Segment Weighting<xs:element name="VariableWeight"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="field" type="FIELD-NAME" use="required"/> </xs:complexType> </xs:element> Two mechanisms are provided for weighting segments. The weight attribute assigns a fixed numeric weight. The VariableWeight element identifies a field that will contain a variable weight. This may be an input field, an output field from the model defined in the segment, or an output field from the model defined in a previous segment. By default, all segments have a weight of 1. If both types of weights
are defined for a given segment then the overall weight of the segment
will be the product of the two. Segment weights are only meaningful if
The model combination methods listed above are applicable as follows:
If the top-level MiningModel's isScorable attribute is false, then entire model is not scorable, no matter what individual segments indicate. If the top-level model has isScorable set to true (or defaults to true), then the isScorable attribute in a Segment's model determines if that segment is used for scoring or not:
If no segments have predicates that resolve to true, the Output should be MISSING for all multipleModelMethods other than "selectAll". As discussed above, "selectAll" can produce multiple values and PMML does not specify a mechanism for returning multiple values. However the PMML consumer implements multiple values, it should return an empty set of values when no predicates match. <xs:simpleType name="MISSING-PREDICTION-TREATMENT"> <xs:restriction base="xs:string"> <xs:enumeration value="returnMissing"/> <xs:enumeration value="skipSegment"/> <xs:enumeration value="continue"/> </xs:restriction> </xs:simpleType> The missing prediction treatment options are used when at least one model for which the predicate in the Segment evaluates to true has a missing result. The attribute missingThreshold is closely related and has default value 1. The options are defined as follows:
When calculating the fraction of missing results for the purpose of applying the
missingThreshold, only segments for which the predicate evaluates to OutputFields contained at top level MiningModel element apply to the winning Segment selected by the multipleModelMethod attribute (selectFirst, selectAll, majorityVote, modelChain, etc.) and the RESULT-FEATURE entityId returns the ID of the winning segment, but output fields from other segments may always be included by specifying the segmentId attribute. OutputFields within Segments allow for results specific to that segment to be returned. Since the Segment id attribute is optional, if it is not specified, Segments are identified by an implicit 1-based index, indicating the position in which each segment appears in the model. A PMML-compliant scoring engine need only return the output fields specified for the topmost MiningModel element, but may additionally return output fields from subsidiary model elements. In the event of conflict between output fields specified in a higher level model and one or more of its subsidiary models, the highest level specification prevails. Since identical OutputField elements can be duplicated across different segments, the OutputField that is used to return results is the OutputField that comes from the Segment selected by the multipleModelMethod attribute (selectFirst, selectAll, majorityVote, modelChain, etc.). A MiningModel may contain Segments that also contain a MiningModel element. For example, the Model Composition approach allows Regression models to be selected using a DecisionTree. When the DecisionTree cannot or should not be implemented using Segment Predicates, the equivalent implementation using Segmentation would have a top-level MiningModel with two segments in a chain: The first Segment would implement the TreeModel and its result would be passed to the second Segment which contains a MiningModel which uses the TreeModel output to select one of it's Regression model Segments. This is the fifth example below, which shows how to pass the output of a Segment as an input to a Segment that contains a MiningModel. It should be noted that a more efficient approach to implementing the Model Composition approach using Segmentation is shown in the sixth example, which does not require a MiningModel within a MiningModel. The following seven examples demonstrate the use of the Segment element to accomplish tree ensembles and segmentation. The first example demonstrates the implementation of an ensemble of classification trees whose results are combined by majority vote: <MiningModel functionName="classification"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Output> <OutputField name="PredictedClass" optype="categorical" dataType="string" feature="predictedValue"/> <OutputField name="ProbSetosa" optype="continuous" dataType="double" feature="probability" value="Iris-setosa"/> <OutputField name="ProbVeriscolor" optype="continuous" dataType="double" feature="probability" value="Iris-versicolor"/> <OutputField name="ProbVirginica" optype="continuous" dataType="double" feature="probability" value="Iris-virginica"/> </Output> <Segmentation multipleModelMethod="majorityVote"> <Segment id="1"> <True/> <TreeModel modelName="Iris" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Node score="Iris-setosa" recordCount="150"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-setosa" recordCount="50"> <SimplePredicate field="petal_length" operator="lessThan" value="2.45"/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="0"/> <ScoreDistribution value="Iris-virginica" recordCount="0"/> </Node> <Node score="Iris-versicolor" recordCount="100"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-versicolor" recordCount="54"> <SimplePredicate field="petal_width" operator="lessThan" value="1.75"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="49"/> <ScoreDistribution value="Iris-virginica" recordCount="5"/> </Node> <Node score="Iris-virginica" recordCount="46"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="1"/> <ScoreDistribution value="Iris-virginica" recordCount="45"/> </Node> </Node> </Node> </TreeModel> </Segment> <Segment id="2"> <True/> <TreeModel modelName="Iris" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Node score="Iris-setosa" recordCount="150"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-setosa" recordCount="50"> <SimplePredicate field="petal_length" operator="lessThan" value="2.15"/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="0"/> <ScoreDistribution value="Iris-virginica" recordCount="0"/> </Node> <Node score="Iris-versicolor" recordCount="100"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-versicolor" recordCount="54"> <SimplePredicate field="petal_width" operator="lessThan" value="1.93"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="49"/> <ScoreDistribution value="Iris-virginica" recordCount="5"/> <Node score="Iris-versicolor" recordCount="48"> <SimplePredicate field="continent" operator="equal" value="africa"/> </Node> <Node score="Iris-virginical" recordCount="6"> <SimplePredicate field="continent" operator="notEqual" value="africa"/> </Node> </Node> <Node score="Iris-virginica" recordCount="46"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="1"/> <ScoreDistribution value="Iris-virginica" recordCount="45"/> </Node> </Node> </Node> </TreeModel> </Segment> <Segment id="3"> <True/> <TreeModel modelName="Iris" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Node score="Iris-setosa" recordCount="150"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-setosa" recordCount="50"> <SimplePredicate field="petal_width" operator="lessThan" value="2.85"/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="0"/> <ScoreDistribution value="Iris-virginica" recordCount="0"/> </Node> <Node score="Iris-versicolor" recordCount="100"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-versicolor" recordCount="54"> <SimplePredicate field="continent" operator="equal" value="asia"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="49"/> <ScoreDistribution value="Iris-virginica" recordCount="5"/> </Node> <Node score="Iris-virginica" recordCount="46"> <SimplePredicate field="continent" operator="notEqual" value="asia"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="1"/> <ScoreDistribution value="Iris-virginica" recordCount="45"/> </Node> </Node> </Node> </TreeModel> </Segment> </Segmentation> </MiningModel> The second example shows an ensemble of regression trees whose results are combined by weighted averaging: <MiningModel functionName="regression"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="target"/> <MiningField name="sepal_width" usageType="active"/> </MiningSchema> <Output> <OutputField name="PredictedSepalLength" optype="continuous" dataType="double" feature="predictedValue"/> </Output> <Segmentation multipleModelMethod="weightedAverage"> <Segment id="1" weight="0.25"> <True/> <TreeModel modelName="Iris" functionName="regression" splitCharacteristic="multiSplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="target"/> <MiningField name="sepal_width" usageType="active"/> </MiningSchema> <Node score="5.843333" recordCount="150"> <True/> <Node score="5.179452" recordCount="73"> <SimplePredicate field="petal_length" operator="lessThan" value="4.25"/> <Node score="5.005660" recordCount="53"> <SimplePredicate field="petal_length" operator="lessThan" value="3.40"/> </Node> <Node score="4.735000" recordCount="20"> <SimplePredicate field="sepal_width" operator="lessThan" value="3.25"/> </Node> <Node score="5.169697" recordCount="33"> <SimplePredicate field="sepal_width" operator="greaterThan" value="3.25"/> </Node> <Node score="5.640000" recordCount="20"> <SimplePredicate field="petal_length" operator="greaterThan" value="3.40"/> </Node> </Node> <Node score="6.472727" recordCount="77"> <SimplePredicate field="petal_length" operator="greaterThan" value="4.25"/> <Node score="6.326471" recordCount="68"> <SimplePredicate field="petal_length" operator="lessThan" value="6.05"/> <Node score="6.165116" recordCount="43"> <SimplePredicate field="petal_length" operator="lessThan" value="5.15"/> <Node score="6.054545" recordCount="33"> <SimplePredicate field="sepal_width" operator="lessThan" value="3.05"/> </Node> <Node score="6.530000" recordCount="10"> <SimplePredicate field="sepal_width" operator="greaterThan" value="3.05"/> </Node> </Node> <Node score="6.604000" recordCount="25"> <SimplePredicate field="petal_length" operator="greaterThan" value="5.15"/> </Node> </Node> <Node score="7.577778" recordCount="9"> <SimplePredicate field="petal_length" operator="greaterThan" value="6.05"/> </Node> </Node> </Node> </TreeModel> </Segment> <Segment id="2" weight="0.25"> <True/> <TreeModel modelName="Iris" functionName="regression" splitCharacteristic="multiSplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="target"/> <MiningField name="sepal_width" usageType="active"/> </MiningSchema> <Node score="5.843333" recordCount="150"> <True/> <Node score="5.073333" recordCount="60"> <SimplePredicate field="petal_width" operator="lessThan" value="1.15"/> <Node score="4.953659" recordCount="41"> <SimplePredicate field="petal_width" operator="lessThan" value="0.35"/> </Node> <Node score="4.688235" recordCount="17"> <SimplePredicate field="sepal_width" operator="lessThan" value="3.25"/> </Node> <Node score="5.141667" recordCount="24"> <SimplePredicate field="sepal_width" operator="greaterThan" value="3.25"/> </Node> <Node score="5.331579" recordCount="19"> <SimplePredicate field="petal_width" operator="greaterThan" value="0.35"/> </Node> </Node> <Node score="6.356667" recordCount="90"> <SimplePredicate field="petal_width" operator="greaterThan" value="1.15"/> <Node score="6.160656" recordCount="61"> <SimplePredicate field="petal_width" operator="lessThan" value="1.95"/> <Node score="5.855556" recordCount="18"> <SimplePredicate field="petal_width" operator="lessThan" value="1.35"/> </Node> <Node score="6.288372" recordCount="43"> <SimplePredicate field="petal_width" operator="greaterThan" value="1.35"/> <Node score="6.000000" recordCount="13"> <SimplePredicate field="sepal_width" operator="lessThan" value="2.75"/> </Node> <Node score="6.413333" recordCount="30"> <SimplePredicate field="sepal_width" operator="greaterThan" value="2.75"/> </Node> </Node> </Node> <Node score="6.768966" recordCount="29"> <SimplePredicate field="petal_width" operator="greaterThan" value="1.95"/> </Node> </Node> </Node> </TreeModel> </Segment> <Segment id="3" weight="0.5"> <True/> <TreeModel modelName="Iris" functionName="regression" splitCharacteristic="multiSplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="target"/> <MiningField name="sepal_width" usageType="active"/> </MiningSchema> <Node score="5.843333" recordCount="150"> <True/> <Node score="5.179452" recordCount="73"> <SimplePredicate field="petal_length" operator="lessThan" value="4.25"/> <Node score="5.005660" recordCount="53"> <SimplePredicate field="petal_length" operator="lessThan" value="3.40"/> </Node> <Node score="5.640000" recordCount="20"> <SimplePredicate field="petal_length" operator="greaterThan" value="3.40"/> </Node> </Node> <Node score="6.472727" recordCount="77"> <SimplePredicate field="petal_length" operator="greaterThan" value="4.25"/> <Node score="6.326471" recordCount="68"> <SimplePredicate field="petal_length" operator="lessThan" value="6.05"/> <Node score="6.165116" recordCount="43"> <SimplePredicate field="petal_length" operator="lessThan" value="5.15"/> </Node> <Node score="6.604000" recordCount="25"> <SimplePredicate field="petal_length" operator="greaterThan" value="5.15"/> </Node> </Node> <Node score="7.577778" recordCount="9"> <SimplePredicate field="petal_length" operator="greaterThan" value="6.05"/> </Node> </Node> </Node> </TreeModel> </Segment> </Segmentation> </MiningModel> The third example shows the implementation of segmentation where the model to employ is the first for which the predicate element of a segment is satisfied. <MiningModel functionName="classification"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Output> <OutputField name="PredictedClass" optype="categorical" dataType="string" feature="predictedValue"/> <OutputField name="ProbSetosa" optype="continuous" dataType="double" feature="probability" value="Iris-setosa"/> <OutputField name="ProbVeriscolor" optype="continuous" dataType="double" feature="probability" value="Iris-versicolor"/> <OutputField name="ProbVirginica" optype="continuous" dataType="double" feature="probability" value="Iris-virginica"/> </Output> <Segmentation multipleModelMethod="selectFirst"> <Segment id="1"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="continent" operator="equal" value="asia"/> <SimplePredicate field="day" operator="lessThan" value="60.0"/> <SimplePredicate field="day" operator="greaterThan" value="0.0"/> </CompoundPredicate> <TreeModel modelName="Iris" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Node score="Iris-setosa" recordCount="150"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-setosa" recordCount="50"> <SimplePredicate field="petal_length" operator="lessThan" value="2.45"/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="0"/> <ScoreDistribution value="Iris-virginica" recordCount="0"/> </Node> <Node score="Iris-versicolor" recordCount="100"> <SimplePredicate field="petal_length" operator="greaterThan" value="2.45"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-versicolor" recordCount="54"> <SimplePredicate field="petal_width" operator="lessThan" value="1.75"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="49"/> <ScoreDistribution value="Iris-virginica" recordCount="5"/> </Node> <Node score="Iris-virginica" recordCount="46"> <SimplePredicate field="petal_width" operator="greaterThan" value="1.75"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="1"/> <ScoreDistribution value="Iris-virginica" recordCount="45"/> </Node> </Node> </Node> </TreeModel> </Segment> <Segment id="2"> <CompoundPredicate booleanOperator="and"> <SimplePredicate field="continent" operator="equal" value="africa"/> <SimplePredicate field="day" operator="lessThan" value="60.0"/> <SimplePredicate field="day" operator="greaterThan" value="0.0"/> </CompoundPredicate> <TreeModel modelName="Iris" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Node score="Iris-setosa" recordCount="150"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-setosa" recordCount="50"> <SimplePredicate field="petal_length" operator="lessThan" value="2.15"/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="0"/> <ScoreDistribution value="Iris-virginica" recordCount="0"/> </Node> <Node score="Iris-versicolor" recordCount="100"> <SimplePredicate field="petal_length" operator="greaterThan" value="2.15"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-versicolor" recordCount="54"> <SimplePredicate field="petal_width" operator="lessThan" value="1.93"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="49"/> <ScoreDistribution value="Iris-virginica" recordCount="5"/> </Node> <Node score="Iris-virginica" recordCount="46"> <SimplePredicate field="petal_width" operator="greaterThan" value="1.93"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="1"/> <ScoreDistribution value="Iris-virginica" recordCount="45"/> </Node> </Node> </Node> </TreeModel> </Segment> <Segment id="3"> <SimplePredicate field="continent" operator="equal" value="africa"/> <TreeModel modelName="Iris" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="day" usageType="active"/> <MiningField name="continent" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Node score="Iris-setosa" recordCount="150"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-setosa" recordCount="50"> <SimplePredicate field="petal_width" operator="lessThan" value="2.85"/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="0"/> <ScoreDistribution value="Iris-virginica" recordCount="0"/> </Node> <Node score="Iris-versicolor" recordCount="100"> <SimplePredicate field="petal_width" operator="greaterThan" value="2.85"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> </Node> </Node> </TreeModel> </Segment> </Segmentation> </MiningModel> The fourth example shows the implementation of chain of models which output a "Pollen Index". The predicted class and class probabilities are returned as additional outputs. <MiningModel functionName="regression"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="string" feature="predictedValue" name="PredictedClass" optype="categorical" targetField="Class" segmentId="1"/> <OutputField dataType="double" feature="probability" name="Probability_setosa" optype="continuous" targetField="Class" value="Iris-setosa" segmentId="1"/> <OutputField dataType="double" feature="probability" name="Probability_versicolor" optype="continuous" targetField="Class" value="Iris-versicolor" segmentId="1"/> <OutputField dataType="double" feature="probability" name="Probability_virginica" optype="continuous" targetField="Class" value="Iris-virginica" segmentId="1"/> <OutputField dataType="double" feature="predictedValue" name="Pollen Index" optype="continuous" targetField="PollenIndex"/> </Output> <Segmentation multipleModelMethod="modelChain"> <Segment id="1"> <True/> <TreeModel modelName="Iris" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="string" feature="predictedValue" name="PredictedClass" optype="categorical"/> <OutputField dataType="double" feature="probability" name="Probability_setosa" optype="continuous" value="Iris-setosa"/> <OutputField dataType="double" feature="probability" name="Probability_versicolor" optype="continuous" value="Iris-versicolor"/> <OutputField dataType="double" feature="probability" name="Probability_virginica" optype="continuous" value="Iris-virginica"/> </Output> <Node score="Iris-setosa" recordCount="150"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-setosa" recordCount="50"> <SimplePredicate field="petal_length" operator="lessThan" value="2.45"/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="0"/> <ScoreDistribution value="Iris-virginica" recordCount="0"/> </Node> <Node score="Iris-versicolor" recordCount="100"> <SimplePredicate field="petal_length" operator="greaterThan" value="2.45"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-versicolor" recordCount="54"> <SimplePredicate field="petal_width" operator="lessThan" value="1.75"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="49"/> <ScoreDistribution value="Iris-virginica" recordCount="5"/> </Node> <Node score="Iris-virginica" recordCount="46"> <SimplePredicate field="petal_width" operator="greaterThan" value="1.75"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="1"/> <ScoreDistribution value="Iris-virginica" recordCount="45"/> </Node> </Node> </Node> </TreeModel> </Segment> <Segment id="2"> <True/> <RegressionModel modelName="PollenIndex" functionName="regression"> <MiningSchema> <MiningField name="Probability_setosa" usageType="active"/> <MiningField name="Probability_versicolor" usageType="active"/> <MiningField name="Probability_virginica" usageType="active"/> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="double" feature="predictedValue" name="Pollen Index" optype="continuous"/> </Output> <RegressionTable intercept="0.3"> <NumericPredictor coefficient="0.8" exponent="1" name="Probability_setosa"/> <NumericPredictor coefficient="0.7" exponent="1" name="Probability_versicolor"/> <NumericPredictor coefficient="0.9" exponent="1" name="Probability_virginica"/> <NumericPredictor coefficient="0.02" exponent="1" name="temperature"/> <NumericPredictor coefficient="-0.1" exponent="1" name="cloudiness"/> </RegressionTable> </RegressionModel> </Segment> </Segmentation> </MiningModel> The fifth example is a MiningModel that contains a Segment that contains a MiningModel, an implementation of the model composition approach where a decision tree model is used to select a regression model. Note that the sixth example has a more efficient implementation that does not require a Segment that contains a MiningModel. The class assignment is produced as an additional output field. <MiningModel functionName="regression"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="Class" usageType="target"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="string" feature="predictedValue" name="PredictedClass" targetField="Class" optype="categorical" segmentId="1"/> <OutputField dataType="double" feature="predictedValue" name="Pollen Index" targetField="PollenIndex" optype="continuous"/> </Output> <Segmentation multipleModelMethod="modelChain"> <Segment id="1"> <True/> <TreeModel modelName="Iris" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="Class" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="string" feature="predictedValue" name="PredictedClass" optype="categorical"/> </Output> <Node score="Iris-setosa" recordCount="150"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-setosa" recordCount="50"> <SimplePredicate field="petal_length" operator="lessThan" value="2.45"/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="0"/> <ScoreDistribution value="Iris-virginica" recordCount="0"/> </Node> <Node score="Iris-versicolor" recordCount="100"> <SimplePredicate field="petal_length" operator="greaterThan" value="2.45"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-versicolor" recordCount="54"> <SimplePredicate field="petal_width" operator="lessThan" value="1.75"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="49"/> <ScoreDistribution value="Iris-virginica" recordCount="5"/> </Node> <Node score="Iris-virginica" recordCount="46"> <SimplePredicate field="petal_width" operator="greaterThan" value="1.75"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="1"/> <ScoreDistribution value="Iris-virginica" recordCount="45"/> </Node> </Node> </Node> </TreeModel> </Segment> <Segment id="2"> <True/> <MiningModel modelName="PollenIndex" functionName="regression"> <MiningSchema> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="PredictedClass" usageType="active"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="double" feature="predictedValue" name="Pollen Index" optype="continuous"/> </Output> <Segmentation multipleModelMethod="selectFirst"> <Segment id="2.1"> <SimplePredicate field="PredictedClass" operator="equal" value="Iris-setosa"/> <RegressionModel modelName="Setosa_PollenIndex" functionName="regression"> <MiningSchema> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="double" feature="predictedValue" name="Setosa Pollen Index" optype="continuous"/> </Output> <RegressionTable intercept="0.3"> <NumericPredictor coefficient="0.02" exponent="1" name="temperature"/> <NumericPredictor coefficient="-0.1" exponent="1" name="cloudiness"/> </RegressionTable> </RegressionModel> </Segment> <Segment id="2.2"> <SimplePredicate field="PredictedClass" operator="equal" value="Iris-versicolor"/> <RegressionModel modelName="Versicolor_PollenIndex" functionName="regression"> <MiningSchema> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="double" feature="predictedValue" name="Versicolor Pollen Index" optype="continuous"/> </Output> <RegressionTable intercept="0.2"> <NumericPredictor coefficient="-0.02" exponent="1" name="temperature"/> <NumericPredictor coefficient="0.1" exponent="1" name="cloudiness"/> </RegressionTable> </RegressionModel> </Segment> <Segment id="2.3"> <SimplePredicate field="PredictedClass" operator="equal" value="Iris-virginica"/> <RegressionModel modelName="Virginica_PollenIndex" functionName="regression"> <MiningSchema> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="double" feature="predictedValue" name="Virginica Pollen Index" optype="continuous"/> </Output> <RegressionTable intercept="0.1"> <NumericPredictor coefficient="0.01" exponent="1" name="temperature"/> <NumericPredictor coefficient="-0.2" exponent="1" name="cloudiness"/> </RegressionTable> </RegressionModel> </Segment> </Segmentation> </MiningModel> </Segment> </Segmentation> </MiningModel> The sixth example is a more efficient implementation of the model composition approach where a decision tree model is used to select a regression model. The first segment contains a TreeModel and its OutputField is used in the predicates of subsequent segments. <MiningModel functionName="regression"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="sepal_length" usageType="supplementary"/> <MiningField name="sepal_width" usageType="supplementary"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField name="Predicted_PollenIndex" dataType="double" feature="predictedValue" optype="continuous"/> </Output> <Segmentation multipleModelMethod="modelChain"> <Segment id="1"> <True/> <TreeModel modelName="Iris" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="petal_length" usageType="active"/> <MiningField name="petal_width" usageType="active"/> </MiningSchema> <Output> <OutputField dataType="string" feature="predictedValue" name="PredictedClass" optype="categorical"/> </Output> <Node score="Iris-setosa" recordCount="150"> <True/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-setosa" recordCount="50"> <SimplePredicate field="petal_length" operator="lessThan" value="2.45"/> <ScoreDistribution value="Iris-setosa" recordCount="50"/> <ScoreDistribution value="Iris-versicolor" recordCount="0"/> <ScoreDistribution value="Iris-virginica" recordCount="0"/> </Node> <Node score="Iris-versicolor" recordCount="100"> <SimplePredicate field="petal_length" operator="greaterThan" value="2.45"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="50"/> <ScoreDistribution value="Iris-virginica" recordCount="50"/> <Node score="Iris-versicolor" recordCount="54"> <SimplePredicate field="petal_width" operator="lessThan" value="1.75"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="49"/> <ScoreDistribution value="Iris-virginica" recordCount="5"/> </Node> <Node score="Iris-virginica" recordCount="46"> <SimplePredicate field="petal_width" operator="greaterThan" value="1.75"/> <ScoreDistribution value="Iris-setosa" recordCount="0"/> <ScoreDistribution value="Iris-versicolor" recordCount="1"/> <ScoreDistribution value="Iris-virginica" recordCount="45"/> </Node> </Node> </Node> </TreeModel> </Segment> <Segment id="2.1"> <SimplePredicate field="PredictedClass" operator="equal" value="Iris-setosa"/> <RegressionModel modelName="Setosa_PollenIndex" functionName="regression"> <MiningSchema> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="double" feature="predictedValue" name="Setosa Pollen Index" optype="continuous"/> </Output> <RegressionTable intercept="0.3"> <NumericPredictor coefficient="0.02" exponent="1" name="temperature"/> <NumericPredictor coefficient="-0.1" exponent="1" name="cloudiness"/> </RegressionTable> </RegressionModel> </Segment> <Segment id="2.2"> <SimplePredicate field="PredictedClass" operator="equal" value="Iris-versicolor"/> <RegressionModel modelName="Versicolor_PollenIndex" functionName="regression"> <MiningSchema> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="double" feature="predictedValue" name="Versicolor Pollen Index" optype="continuous"/> </Output> <RegressionTable intercept="0.2"> <NumericPredictor coefficient="-0.02" exponent="1" name="temperature"/> <NumericPredictor coefficient="0.1" exponent="1" name="cloudiness"/> </RegressionTable> </RegressionModel> </Segment> <Segment id="2.3"> <SimplePredicate field="PredictedClass" operator="equal" value="Iris-virginica"/> <RegressionModel modelName="Virginica_PollenIndex" functionName="regression"> <MiningSchema> <MiningField name="temperature" usageType="active"/> <MiningField name="cloudiness" usageType="active"/> <MiningField name="PollenIndex" usageType="target"/> </MiningSchema> <Output> <OutputField dataType="double" feature="predictedValue" name="Virginica Pollen Index" optype="continuous"/> </Output> <RegressionTable intercept="0.1"> <NumericPredictor coefficient="0.01" exponent="1" name="temperature"/> <NumericPredictor coefficient="-0.2" exponent="1" name="cloudiness"/> </RegressionTable> </RegressionModel> </Segment> </Segmentation> </MiningModel> The seventh example is an ensemble of three trees in which each tree is weighted based on the node assignment. <MiningModel functionName="classification" algorithmName="Random Forests"> <MiningSchema> <MiningField name="SPECIES" usageType="predicted" missingValueTreatment="asIs" invalidValueTreatment="asMissing"/> <MiningField name="SEPALLEN" usageType="active" invalidValueTreatment="asMissing" importance="0.010374" missingValueReplacement="5.8"/> <MiningField name="SEPALWID" usageType="active" invalidValueTreatment="asMissing" importance="0.000788" missingValueReplacement="3"/> <MiningField name="PETALLEN" usageType="active" invalidValueTreatment="asMissing" importance="1" missingValueReplacement="4.35"/> <MiningField name="PETALWID" usageType="active" invalidValueTreatment="asMissing" importance="0.168946" missingValueReplacement="1.3"/> </MiningSchema> <Output> <OutputField name="PROB_1" optype="continuous" dataType="double" feature="probability" targetField="SPECIES" value="1"/> <OutputField name="PROB_2" optype="continuous" dataType="double" feature="probability" targetField="SPECIES" value="2"/> <OutputField name="PROB_3" optype="continuous" dataType="double" feature="probability" targetField="SPECIES" value="3"/> <OutputField name="PREDICTION" optype="categorical" dataType="integer" targetField="SPECIES" feature="predictedValue"/> </Output> <Segmentation multipleModelMethod="weightedMajorityVote"> <Segment id="1"> <True/> <TreeModel modelName="Tree1" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="SPECIES" usageType="predicted"/> <MiningField name="SEPALLEN" usageType="active"/> <MiningField name="PETALLEN" usageType="active"/> <MiningField name="PETALWID" usageType="active"/> </MiningSchema> <Output> <OutputField name="PredTree1" optype="categorical" dataType="integer" feature="predictedValue"/> <OutputField name="NodeTree1" optype="categorical" dataType="integer" feature="entityId"/> <OutputField name="WEIGHT1" displayName="Tree Weight" optype="continuous" dataType="double" feature="transformedValue" isFinalResult="false"> <MapValues outputColumn="weight" defaultValue="1" dataType="double"> <FieldColumnPair field="NodeTree1" column="node"/> <InlineTable> <row><node>1</node><weight>2.5</weight></row> <row><node>2</node><weight>4.75</weight></row> <row><node>3</node><weight>1.5</weight></row> <row><node>4</node><weight>0.666667</weight></row> <row><node>5</node><weight>0.666667</weight></row> <row><node>6</node><weight>0.4</weight></row> <row><node>7</node><weight>0.4</weight></row> </InlineTable> </MapValues> </OutputField> </Output> <Node id="-1"> <True/> <Node id="1" score="1"> <SimplePredicate field="PETALLEN" operator="lessOrEqual" value="2.45000004768"/> </Node> <Node id="-2"> <True/> <Node id="-3"> <SimplePredicate field="PETALLEN" operator="lessOrEqual" value="4.75"/> <Node id="2" score="3"> <SimplePredicate field="SEPALLEN" operator="lessOrEqual" value="4.94999980927"/> </Node> <Node id="3" score="2"> <True/> </Node> </Node> <Node id="-4"> <True/> <Node id="-5"> <SimplePredicate field="PETALWID" operator="lessOrEqual" value="1.75"/> <Node id="-6"> <SimplePredicate field="SEPALLEN" operator="lessOrEqual" value="6.5"/> <Node id="4" score="3"> <SimplePredicate field="PETALWID" operator="lessOrEqual" value="1.54999995232"/> </Node> <Node id="5" score="2"> <True/> </Node> </Node> <Node id="6" score="2"> <True/> </Node> </Node> <Node id="7" score="3"> <True/> </Node> </Node> </Node> </Node> </TreeModel> <VariableWeight field="WEIGHT1"/> </Segment> <Segment id="2"> <True/> <TreeModel modelName="Tree2" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="SPECIES" usageType="predicted"/> <MiningField name="SEPALLEN" usageType="active"/> <MiningField name="PETALLEN" usageType="active"/> <MiningField name="PETALWID" usageType="active"/> </MiningSchema> <Output> <OutputField name="PredTree2" optype="categorical" dataType="integer" feature="predictedValue"/> <OutputField name="NodeTree2" optype="categorical" dataType="integer" feature="entityId"/> <OutputField name="WEIGHT2" displayName="Tree Weight" optype="continuous" dataType="double" feature="transformedValue" isFinalResult="false"> <MapValues outputColumn="weight" defaultValue="1" dataType="double"> <FieldColumnPair field="NodeTree2" column="node"/> <InlineTable> <row><node>1</node><weight>2.0</weight></row> <row><node>2</node><weight>0.5</weight></row> <row><node>3</node><weight>0.666667</weight></row> <row><node>4</node><weight>1.0</weight></row> </InlineTable> </MapValues> </OutputField> </Output> <Node id="-1"> <True/> <Node id="1" score="1"> <SimplePredicate field="PETALLEN" operator="lessOrEqual" value="2.59999990463"/> </Node> <Node id="-2"> <True/> <Node id="-3"> <SimplePredicate field="PETALWID" operator="lessOrEqual" value="1.75"/> <Node id="2" score="2"> <SimplePredicate field="SEPALLEN" operator="lessOrEqual" value="7.09999990463"/> </Node> <Node id="3" score="3"> <True/> </Node> </Node> <Node id="4" score="3"> <True/> </Node> </Node> </Node> </TreeModel> <VariableWeight field="WEIGHT2"/> </Segment> <Segment id="3"> <True/> <TreeModel modelName="Tree3" functionName="classification" splitCharacteristic="binarySplit"> <MiningSchema> <MiningField name="SPECIES" usageType="predicted"/> <MiningField name="SEPALWID" usageType="active"/> <MiningField name="PETALLEN" usageType="active"/> <MiningField name="PETALWID" usageType="active"/> </MiningSchema> <Output> <OutputField name="PredTree3" optype="categorical" dataType="integer" feature="predictedValue"/> <OutputField name="NodeTree3" optype="categorical" dataType="integer" feature="entityId"/> <OutputField name="WEIGHT3" displayName="Tree Weight" optype="continuous" dataType="double" feature="transformedValue" isFinalResult="false"> <MapValues outputColumn="weight" defaultValue="1" dataType="double"> <FieldColumnPair field="NodeTree3" column="node"/> <InlineTable> <row><node>1</node><weight>2.0</weight></row> <row><node>2</node><weight>1.0</weight></row> <row><node>3</node><weight>0.5</weight></row> <row><node>4</node><weight>0.5</weight></row> <row><node>5</node><weight>1.0</weight></row> </InlineTable> </MapValues> </OutputField> </Output> <Node id="-1"> <True/> <Node id="1" score="1"> <SimplePredicate field="PETALLEN" operator="lessOrEqual" value="2.34999990463"/> </Node> <Node id="-2"> <True/> <Node id="-3"> <SimplePredicate field="PETALLEN" operator="lessOrEqual" value="4.94999980927"/> <Node id="2" score="2"> <SimplePredicate field="PETALWID" operator="lessOrEqual" value="1.65000009537"/> </Node> <Node id="-4"> <True/> <Node id="3" score="3"> <SimplePredicate field="SEPALWID" operator="lessOrEqual" value="3"/> </Node> <Node id="4" score="2"> <True/> </Node> </Node> </Node> <Node id="5" score="3"> <True/> </Node> </Node> </Node> </TreeModel> <VariableWeight field="WEIGHT3"/> </Segment> </Segmentation> </MiningModel> Model Composition (Deprecated in PMML 4.1)NOTE: In PMML 4.1, the Model Composition approach has been deprecated since the Segmentation approach allows for a wider range of models to be used more reliably. For more on deprecation, see Conformance. Two general variants of Model Composition of decision trees and simple regression are supported:
Model composition uses three syntactical concepts
For example, using a sequence of models, a field could be defined by a regression equation. This field is then used as an ordinary input field in a decision tree. The basic idea is that we capture the essential elements of a model, in this example from a regression model, and use them to define new fields. That is similar to defining a derived field. Mining models and their corresponding embedded elementsThe first steps in making models reusable in other models is the definition of 'model expression' elements that can be embedded in another model. PMML defines the two elements Regression and DecisionTree.
EmbeddedModel does not contain a MiningSchema. There is only one MiningSchema at the top-level. <xs:group name="EmbeddedModel"> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:choice> <xs:element ref="Regression"/> <xs:element ref="DecisionTree"/> </xs:choice> </xs:sequence> </xs:group> The element ResultField is very similar to OutputField and DerivedField. It allows an embedded model to define a new field that can be used by a subsequent model. <xs:element name="ResultField"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="name" type="FIELD-NAME" use="required"/> <xs:attribute name="displayName" type="xs:string"/> <xs:attribute name="optype" type="OPTYPE"/> <xs:attribute name="dataType" type="DATATYPE"/> <xs:attribute name="feature" type="RESULT-FEATURE"/> <xs:attribute name="value" type="xs:string"/> </xs:complexType> </xs:element> Model selection is enabled by allowing an EmbeddedModel within a tree Node. The element Regression contains the essential elements of a RegressionModel: <xs:element name="Regression"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Output" minOccurs="0"/> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="Targets" minOccurs="0"/> <xs:element ref="LocalTransformations" minOccurs="0"/> <xs:element ref="ResultField" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="RegressionTable" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/> <xs:attribute name="algorithmName" type="xs:string"/> <xs:attribute name="normalizationMethod" type="REGRESSIONNORMALIZATIONMETHOD" default="none"/> </xs:complexType> </xs:element> ResultFields are elements that define named results, see above. The element DecisionTree contains the essential elements of a TreeModel: <xs:element name="DecisionTree"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Output" minOccurs="0"/> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="Targets" minOccurs="0"/> <xs:element ref="LocalTransformations" minOccurs="0"/> <xs:element ref="ResultField" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Node"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/> <xs:attribute name="algorithmName" type="xs:string"/> <xs:attribute name="missingValueStrategy" type="MISSING-VALUE-STRATEGY" default="none"/> <xs:attribute name="missingValuePenalty" type="PROB-NUMBER" default="1.0"/> <xs:attribute name="noTrueChildStrategy" type="NO-TRUE-CHILD-STRATEGY" default="returnNullPrediction"/> <xs:attribute name="splitCharacteristic" default="multiSplit"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="binarySplit"/> <xs:enumeration value="multiSplit"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> Regression and DecisionTree can exclusively be used to build a model using the MiningModel model type. Model Sequencing for Input TransformationsThe following example demonstrates how a regression equation can be used to define an input transformation in another model which happens to be a TreeModel. <PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.2"> <Header copyright="DMG.org"/> <DataDictionary numberOfFields="5"> <DataField name="age" optype="continuous" dataType="double"/> <DataField name="income" optype="continuous" dataType="double"/> <DataField name="gender" optype="categorical" dataType="string"> <Value value="female"/> <Value value="male"/> </DataField> <DataField name="weight" optype="continuous" dataType="double"/> </DataDictionary> <MiningModel functionName="regression"> <MiningSchema> <MiningField name="age"/> <MiningField name="income"/> <MiningField name="gender"/> <MiningField name="weight" usageType="target"/> </MiningSchema> <LocalTransformations> <DerivedField name="mc" optype="continuous" dataType="double"> <MapValues outputColumn="mapped" mapMissingTo="-1"> <FieldColumnPair field="gender" column="sourceval"/> <InlineTable> <row><sourceval>female</sourceval><mapped>1</mapped></row> <row><sourceval>male</sourceval><mapped>0</mapped></row> </InlineTable> </MapValues> </DerivedField> </LocalTransformations> <Regression functionName="regression"> <ResultField name="term" feature="predictedValue"/> <RegressionTable intercept="2.34"> <NumericPredictor name="income" coefficient="0.03"/> <PredictorTerm coefficient="1.23"> <FieldRef field="age"/> <FieldRef field="mc"/> </PredictorTerm> </RegressionTable> </Regression> <DecisionTree functionName="regression"> <Node score="0.0"> <True/> <Node score="32.32"> <SimplePredicate field="term" operator="lessThan" value="42"/> </Node> <Node score="78.91"> <SimplePredicate field="term" operator="greaterOrEqual" value="42"/> </Node> </Node> </DecisionTree> </MiningModel> </PMML> Remarks:
Model selection through tree modelsModel selection through a tree model in PMML allows for combining multiple 'embedded models', aka model expressions, into the decision logic that selects one of the models depending on the current input values. The following example shows how regression elements are used within the nodes of a decision tree: <PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header copyright="DMG.org"/> <DataDictionary numberOfFields="5"> <DataField name="age" optype="continuous" dataType="double"/> <DataField name="income" optype="continuous" dataType="double"/> <DataField name="gender" optype="categorical" dataType="string"> <Value value="female"/> <Value value="male"/> </DataField> <DataField name="weight" optype="continuous" dataType="double"/> </DataDictionary> <MiningModel functionName="regression"> <MiningSchema> <MiningField name="age"/> <MiningField name="income"/> <MiningField name="gender"/> <MiningField name="weight" usageType="target"/> </MiningSchema> <LocalTransformations> <DerivedField name="mc" optype="continuous" dataType="double"> <MapValues outputColumn="mapped" mapMissingTo="-1"> <FieldColumnPair field="gender" column="sourceval"/> <InlineTable> <row><sourceval>female</sourceval><mapped>1</mapped></row> <row><sourceval>male</sourceval><mapped>0</mapped></row> </InlineTable> </MapValues> </DerivedField> </LocalTransformations> <DecisionTree functionName="regression"> <Node score="0.0"> <True/> <Node score="0.0"> <SimplePredicate field="age" operator="lessOrEqual" value="50"/> <Regression functionName="regression"> <RegressionTable intercept="0.0"> <NumericPredictor name="income" coefficient="0.03"/> <PredictorTerm coefficient="1.23"> <FieldRef field="age"/> <FieldRef field="mc"/> </PredictorTerm> </RegressionTable> </Regression> </Node> <Node score="0.0"> <SimplePredicate field="age" operator="greaterThan" value="50"/> <Regression functionName="regression"> <RegressionTable intercept="2.22"> <NumericPredictor name="income" coefficient="0.01"/> <PredictorTerm coefficient="-0.11"> <FieldRef field="age"/> <FieldRef field="mc"/> </PredictorTerm> </RegressionTable> </Regression> </Node> </Node> </DecisionTree> </MiningModel> </PMML> |
||||||||||||||
|