DMG logo PMML 4.3 - Sequence Rules
PMML4.3 Menu

Home

Changes

XML Schema

Conformance

Interoperability

General Structure

Field Scope

Header

Data
Dictionary


Mining
Schema


Transformations

Statistics

Taxomony

Targets

Output

Functions

Built-in Functions

Model Verification

Model Explanation

Multiple Models

Association Rules

Baseline Models

Bayesian Network

Cluster
Models


Gaussian
Process


General
Regression


k-Nearest
Neighbors


Naive
Bayes


Neural
Network


Regression

Ruleset

Scorecard

Sequences

Text Models

Time Series

Trees

Vector Machine

PMML 4.3 - Sequence Rules

A Sequence Rule model represents rules for various sets or items. For instance, a rule can express that after purchasing products A and B, customers tend to buy product C sooner or later as well.
Sequences are defined by Itemsets which in turn can contain one or more Items. SequenceRules define the relationship between Sequences. In addition, constraints regarding the time between the appearance of Itemsets or Sequences can be specified.

SequenceModel

A Sequence model requires that the MiningSchema has a field with usageType of group. It groups the Itemsets into transaction groups. If a MiningField labeled as order exists, it defines the chronology of the Itemsets within a transaction group. The dataType of its respective DataField specifies the measure unit for all times given within the model. E.g., for dataType="dateDaysSince[1970]" the resulting time measure unit is days.
In case there is no MiningField with a usageType of order, it is assumed that all transactions took place in equidistant time. The measure unit for the times in the model is of no interest in that case. Furthermore, times are represented by integer values and successive transactions within a group are spaced 1 time unit apart.
A sequence model consists of a number of major parts:

<xs:element name="SequenceModel">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="MiningSchema"/>
      <xs:element ref="ModelStats" minOccurs="0"/>
      <xs:element ref="LocalTransformations" minOccurs="0"/>
      <xs:element ref="Constraints" minOccurs="0"/>
      <xs:element ref="Item" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Itemset" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="SetPredicate" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Sequence" maxOccurs="unbounded"/>
      <xs:element ref="SequenceRule" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="modelName" type="xs:string"/>
    <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/>
    <xs:attribute name="algorithmName" type="xs:string"/>
    <xs:attribute name="numberOfTransactions" type="INT-NUMBER"/>
    <xs:attribute name="maxNumberOfItemsPerTransaction" type="INT-NUMBER"/>
    <xs:attribute name="avgNumberOfItemsPerTransaction" type="REAL-NUMBER"/>      
    <xs:attribute name="numberOfTransactionGroups" type="INT-NUMBER"/>
    <xs:attribute name="maxNumberOfTAsPerTAGroup" type="INT-NUMBER"/>
    <xs:attribute name="avgNumberOfTAsPerTAGroup" type="REAL-NUMBER"/>
    <xs:attribute name="isScorable" type="xs:boolean" default="true"/>
  </xs:complexType>
</xs:element>
  • Item is defined in the Association Model.
  • Itemset is defined in the Association Model.
  • SetPredicate is a set of predicates made up of simple boolean expressions.
  • Sequence is an ordered collection of Itemsets or SetPredicates. There will be at least one Sequence.
  • SequenceRule describes the relationship between two sequences.

Attribute descriptions for SequenceModel:

  • numberOfTransactions: the number of objects in the data the model was built on, e.g., unique customers or visitors.
  • maxNumberOfItemsPerTransaction: the maximum number of events (e.g., visits) per object.
  • avgNumberOfItemsPerTransaction: the average number of events that make up the object.
  • numberOfTransactionGroups: total number of transaction groups in the trainings data.
  • maxNumberOfTAPerTAGroup: maximum number of transactions for all transaction groups.
  • avgNumberOfTAPerTAGroup: average number of transactions for all transaction groups.
  • isScorable: This attribute indicates if the model is valid for scoring. If this attribute is true or if it is missing, then the model should be processed normally. However, if the attribute is false, then the model producer has indicated that this model is intended for information purposes only and should not be used to generate results. In order to be valid PMML, all required elements and attributes must be present, even for non-scoring models. For more details, see General Structure.

Note that these attributes are for information only (except isScorable).

Attributes in the element Constraints represent global constraints from the model build-phase that apply to all Items and Itemsets. If not present, then there were no constraints during the model build-phase.

<xs:element name="Constraints">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="minimumNumberOfItems" type="INT-NUMBER" default="1"/>
    <xs:attribute name="maximumNumberOfItems" type="INT-NUMBER"/>
    <xs:attribute name="minimumNumberOfAntecedentItems" type="INT-NUMBER" default="1"/>
    <xs:attribute name="maximumNumberOfAntecedentItems" type="INT-NUMBER"/>
    <xs:attribute name="minimumNumberOfConsequentItems" type="INT-NUMBER" default="1"/>
    <xs:attribute name="maximumNumberOfConsequentItems" type="INT-NUMBER"/>
    <xs:attribute name="minimumSupport" type="REAL-NUMBER" default="0"/>
    <xs:attribute name="minimumConfidence" type="REAL-NUMBER" default="0"/>
    <xs:attribute name="minimumLift" type="REAL-NUMBER" default="0"/>
    <xs:attribute name="minimumTotalSequenceTime" type="REAL-NUMBER" default="0"/>
    <xs:attribute name="maximumTotalSequenceTime" type="REAL-NUMBER"/>
    <xs:attribute name="minimumItemsetSeparationTime" type="REAL-NUMBER" default="0"/>
    <xs:attribute name="maximumItemsetSeparationTime" type="REAL-NUMBER"/>
    <xs:attribute name="minimumAntConsSeparationTime" type="REAL-NUMBER" default="0"/>
    <xs:attribute name="maximumAntConsSeparationTime" type="REAL-NUMBER"/>
  </xs:complexType>
</xs:element>

Attribute description:

  • minimumNumberOfItems: the minimum number of Items in a sequence.
  • maximumNumberOfItems: the maximum number of Items in a sequence.
  • minimumNumberOfAntecedentItems: the minimum number of Items in a sequence's antecedent that was used during the model build-phase to filter the rules.
  • maximumNumberOfAntecedentItems: the maximum number of Items in a sequence's antecedent that was used during the model build-phase to filter the rules. If not present, then there was no limit.
  • minimumNumberOfConsequentItems: the minimum number of Items in a sequence's consequent that was used during the model build-phase to filter the rules.
  • maximumNumberOfConsequentItems: the maximum number of Items in a sequence's consequent that was used during the model build-phase to filter the rules. If not present, then there was no limit.
  • minimumSupport: the minimum support that was used during the model build-phase to filter the rules.
  • minimumConfidence: the minimum confidence that was used during the model build-phase to filter the rules.
  • minimumLift: the minimum lift that was used during the model build-phase to filter the rules.
  • minimumTotalSequenceTime: the minimum total elapsed time from beginning to the end of a sequence rule that was used during the model build-phase to filter the rules.
  • maximumTotalSequenceTime: the maximum total elapsed time from beginning to the end of a sequence rule that was used during the model build-phase to filter the rules. If not present then there was no limit.
  • minimumItemsetSeparationTime: the minimum time allowed between two Itemsets in a sequence rule's antecedent that was used during the model build-phase to filter the rules.
  • maximumItemsetSeparationTime: the maximum time allowed between two Itemsets in a sequence rule's antecedent that was used during the model build-phase to filter the rules. If not present then there was no limit.
  • minimumAntConsSeparationTime: minimum time between antecedent and consequent Sequence in a SequenceRule that was used during the model build-phase to filter the rules.
  • maximumAntConsSeparationTime: maximum time between antecedent and consequent Sequence in a SequenceRule that was used during the model build-phase to filter the rules. If not present then there was no limit.

SetPredicate

Note: SetPredicate is deprecated as of PMML 3.1 and should not be used anymore!
<xs:simpleType name="ELEMENT-ID">  
  <xs:restriction base="xs:string">
  </xs:restriction>
</xs:simpleType>

<xs:element name="SetPredicate">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:group ref="STRING-ARRAY"/>
    </xs:sequence>
    <xs:attribute name="id" type="ELEMENT-ID" use="required"/>
    <xs:attribute name="field" type="FIELD-NAME" use="required"/>
    <xs:attribute name="operator" type="xs:string" fixed="supersetOf"/>
  </xs:complexType>
</xs:element>

SetPredicate elements consist of a boolean expression. This is made up of a field, a comparison operator, and a value. The value(s) will be written in the form of an array.

Attribute description:

  • id: An element ID uniquely identifying a predicate set. (Referenced in Sequences by setId.)
  • field: The subject of the predicate statement. Usually this name refers to one of the DerivedField elements in the TransformationDictionary.
  • operator: The association between the subject of the predicate statement and the array of values.

Note that a SetPredicate compares two sets while a SimpleSetPredicte (as defined in the tree model) checks membership of a single value in a set.

Delimiter & Time

<xs:simpleType name="DELIMITER">
  <xs:restriction base="xs:string">
    <xs:enumeration value="sameTimeWindow"/>
    <xs:enumeration value="acrossTimeWindows"/>
  </xs:restriction>
</xs:simpleType>

<xs:simpleType name="GAP">
  <xs:restriction base="xs:string">
    <xs:enumeration value="true"/>
    <xs:enumeration value="false"/>
    <xs:enumeration value="unknown"/>
  </xs:restriction>
</xs:simpleType>

<xs:element name="Delimiter">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="delimiter" type="DELIMITER" use="required"/>
    <xs:attribute name="gap" type="GAP" use="required"/>
  </xs:complexType>
</xs:element>

Delimiter is the separation between two Itemsets in a Sequence, or between two Sequences in a SequenceRule.

Attribute description:

  • delimiter: states whether or not this Itemset or SetPredicate occurred within the same event or time period, as defined by a time window as the previous one. E.g., if items are purchased during the same visit, delimiter would be sameTimeWindow. If items are purchased in separate visits, the value for delimiter would be acrossTimeWindow
  • gap: Indicates whether additional Itemsets or SetPredicates can be present to match the respective sequence. true represents an open sequence, which allows for gaps between sequences (as does unknown). In a closed sequence the gap is set to false, indicating that the two Sets or Sequences being described are consecutive sets in the data. E.g., if the sequence is A→B→C→D→E, then the sequence B→D would only match if gap is specified as true or unknown.
<xs:element name="Time">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="min" type="NUMBER"/>
    <xs:attribute name="max" type="NUMBER"/>
    <xs:attribute name="mean" type="NUMBER"/>
    <xs:attribute name="standardDeviation" type="NUMBER"/>
  </xs:complexType>
</xs:element>

Time is only statistics for information, not implying any constraints.

The following attributes apply either to Itemsets in a Sequence or AntecedentSequence and ConsequentSequence:

  • min: the minimum time inbetween.
  • max: the maximum time inbetween.
  • mean: the mean time inbetween.
  • standardDeviation: the standard deviation of the time inbetween.

Sequence

<xs:group name="FOLLOW-SET">
  <xs:sequence>
    <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    <xs:element ref="Delimiter"/>
    <xs:element ref="Time" minOccurs="0"/>
    <xs:element ref="SetReference"/>
  </xs:sequence>
</xs:group>

<xs:element name="Sequence">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="SetReference"/>
      <xs:sequence minOccurs="0" maxOccurs="unbounded">
        <xs:group ref="FOLLOW-SET"/>
      </xs:sequence>
      <xs:element ref="Time" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute name="id" type="ELEMENT-ID" use="required"/>
    <xs:attribute name="numberOfSets" type="INT-NUMBER"/>
    <xs:attribute name="occurrence" type="INT-NUMBER"/>
    <xs:attribute name="support" type="REAL-NUMBER"/>
  </xs:complexType>
</xs:element>

Each Sequence mainly consists of a SetReference.

The Time element between Delimiter and SetReference gives statistics on the elapsed time between each Itemset.

The Time element after the final SetReference gives statistics on the total elapsed time from the first to the last Itemset in the Sequence.

Attribute description:

  • id: the unique ID of this sequence. (Referenced in SequenceRules by seqId).
  • numberOfSets: the number of ItemSets or SetPredicates in this sequence.
  • occurrence: the number of objects in the data for which this sequence holds true.
  • support: the ratio of the number of objects in the data for which this sequence holds true, to the total number of objects in the data.
<xs:element name="SetReference">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="setId" type="ELEMENT-ID" use="required"/>
  </xs:complexType>
</xs:element>

The SetReference refers (or points) to a previously defined set. That set will be either a SetPredicate or an Itemset (which will contain ItemRef elements).

Attribute description:

  • setId: a pointer to the id of an Itemset or SetPredicate.

Sequence Rules

<xs:element name="SequenceRule">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      <xs:element ref="AntecedentSequence"/>
      <xs:element ref="Delimiter"/>
      <xs:element ref="Time" minOccurs="0"/>
      <xs:element ref="ConsequentSequence"/>
      <xs:element ref="Time" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute name="id" type="ELEMENT-ID" use="required"/>
    <xs:attribute name="numberOfSets" type="INT-NUMBER" use="required"/>
    <xs:attribute name="occurrence" type="INT-NUMBER" use="required"/>
    <xs:attribute name="support" type="REAL-NUMBER" use="required"/>
    <xs:attribute name="confidence" type="REAL-NUMBER" use="required"/>
    <xs:attribute name="lift" type="REAL-NUMBER"/>
  </xs:complexType>
</xs:element>

A Sequence Rule consists of an antecedent Sequence and a consequent Sequence, separated by a Delimiter. The Time element between AntecedentSequence and ConsequentSequence gives statistics on the elapsed time between the antecedent and the consequence, while the Time element after ConsequentSequence gives statistics on the total elapsed time from the first to the last Itemset in the sequence rule.

Attribute description:

  • id: the unique ID of this SequenceRule.
  • numberOfSets: the total number of sets in both the antecedent and consequent Sequences.
  • occurrence: the number of objects in the data for which the antecedent and consequent Sequences hold true.
  • support: the ratio of the number of objects in the data for which the antecedent and consequent Sequences hold true, to the total number of objects in the data.
  • confidence: probability of the consequent following the antecedent. Calculated as the number of occurrences of a sequence rule divided by the number of occurrences of the antecedent.
  • lift: ratio between the actual support of a SequenceRule and its expected support. The expected support of a SequenceRule A→C with nA antecedent item sets and nC consequent item sets is support(A) * support(C)/ binomialCoefficient( nA+nC, nC ).

Note: Compared to the formula for the lift of an AssociationRule, there is an additional correction factor binomialCoefficient( nA+nC, nC ). This factor accounts for the fact that there are binomialCoefficient( nA+nC, nC ) different possibilities for the time order in which the antecedent and the head sequence can be realized in a transaction group, and only one of them contributes to the support of the sequence, namely the time order in which the first consequent item set occurs after the last antecedent item set.

Antecedent and Consequent Sequences

<xs:group name="SEQUENCE">
  <xs:sequence>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:element ref="SequenceReference"/>
    <xs:element ref="Time" minOccurs="0"/>
  </xs:sequence>
</xs:group>

<xs:element name="SequenceReference">
  <xs:complexType>
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
    <xs:attribute name="seqId" type="ELEMENT-ID" use="required"/>
  </xs:complexType>
</xs:element>

<xs:element name="AntecedentSequence">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="SEQUENCE"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:element name="ConsequentSequence">
  <xs:complexType>
    <xs:sequence>
      <xs:group ref="SEQUENCE"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

Attribute description:

  • seqId: a pointer to the id attribute of a previously defined Sequence.

Example

<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">
  <Header copyright="DMG.org"/>
  <DataDictionary numberOfFields="5">
    <DataField name="CUSTOMER_ID" displayName="CUSTOMER_ID" optype="categorical" dataType="integer"/>
    <DataField name="TRANSDATE" displayName="TRANSDATE" optype="continuous" dataType="dateDaysSince[0]"/>
    <DataField name="ITEMID" displayName="ITEMID" optype="categorical" dataType="string"/>
    <DataField name="STOREID" displayName="STOREID" optype="categorical" dataType="string"/>
    <DataField name="TRANSID" displayName="TRANSID" optype="categorical" dataType="string"/>
  </DataDictionary>
  <SequenceModel functionName="sequences" numberOfTransactions="175">
    <MiningSchema>
      <MiningField name="CUSTOMER_ID" usageType="group"/>
      <MiningField name="TRANSDATE" usageType="order"/>
      <MiningField name="ITEMID"/>
      <MiningField name="STOREID"/>
      <MiningField name="TRANSID"/>
    </MiningSchema>

    <Constraints minimumSupport="0.2" minimumConfidence="0.5"/>

    <Item id="0" value="177" mappedValue="Cognac"/>
    <Item id="1" value="129" mappedValue="Cream"/>
    <Item id="2" value="144" mappedValue="Tonic water"/>
    <Item id="3" value="174" mappedValue="Vodka"/>
    <Item id="4" value="108" mappedValue="Cider"/>
    <Item id="5" value="172" mappedValue="Scotch Whisky"/>
    <Item id="6" value="130" mappedValue="Root Beer"/>

    <Itemset id="0" support="0.0628571428571429" numberOfItems="1">
      <ItemRef itemRef="0"/>
    </Itemset>
    <Itemset id="1" support="0.24" numberOfItems="2">
      <ItemRef itemRef="1"/>
      <ItemRef itemRef="2"/>
    </Itemset>
    <Itemset id="2" support="0.0628571428571429" numberOfItems="3">
      <ItemRef itemRef="3"/>
      <ItemRef itemRef="4"/>
      <ItemRef itemRef="5"/>
    </Itemset>
    <Itemset id="3" support="0.0628571428571429" numberOfItems="1">
      <ItemRef itemRef="6"/>
    </Itemset>

    <Sequence id="0" numberOfSets="1" occurrence="5" support="0.02">
      <SetReference setId="0"/>
    </Sequence>
    <Sequence id="1" numberOfSets="2" occurrence="6" support="0.25">
      <SetReference setId="0"/>
      <Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
      <SetReference setId="2"/>
    </Sequence>
    <Sequence id="2" numberOfSets="1" occurrence="5" support="0.45">
      <SetReference setId="1"/>
    </Sequence>
    <Sequence id="3" numberOfSets="1" occurrence="15" support="0.2">
      <SetReference setId="3"/>
    </Sequence>

    <SequenceRule id="0" numberOfSets="2" occurrence="5" support="0.20833" confidence="0.55556">
      <AntecedentSequence>
        <SequenceReference seqId="0"/>
      </AntecedentSequence>
      <Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
      <Time min="5" max="8" mean="6.8"/>
      <ConsequentSequence>
        <SequenceReference seqId="2"/>
      </ConsequentSequence>
    </SequenceRule>
    <SequenceRule id="1" numberOfSets="2" occurrence="6" support="0.25" confidence="0.66667">
      <AntecedentSequence>
        <SequenceReference seqId="1"/>
      </AntecedentSequence>
      <Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
      <Time min="2" max="8" mean="6.16667"/>
      <ConsequentSequence>
        <SequenceReference seqId="3"/>
      </ConsequentSequence>
    </SequenceRule>
    <SequenceRule id="2" numberOfSets="2" occurrence="5" support="0.20833" confidence="0.55556">
      <AntecedentSequence>
        <SequenceReference seqId="2"/>
      </AntecedentSequence>
      <Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
      <Time min="2" max="8" mean="6.6"/>
      <ConsequentSequence>
        <SequenceReference seqId="3"/>
      </ConsequentSequence>
    </SequenceRule>
    <SequenceRule id="3" numberOfSets="2" occurrence="14" support="0.58333" confidence="0.73684">
      <AntecedentSequence>
        <SequenceReference seqId="3"/>
      </AntecedentSequence>
      <Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
      <Time min="1" max="10" mean="6.14286"/>
      <ConsequentSequence>
        <SequenceReference seqId="0"/>
      </ConsequentSequence>
    </SequenceRule>
  </SequenceModel>
</PMML>
e-mail info at dmg.org