PMML 4.0 - Sequence Rules
A Sequence Rule model represents rules for various sets or items.
For instance, a rule can express that after purchasing products A and B,
customers tend to buy product C sooner or later as well.
Sequences are defined by Itemsets which in turn can contain one or more Items.
SequenceRules define the relationship between Sequences. In addition, constraints
regarding the time between the appearance of Itemsets or Sequences can be specified.
SequenceModel
A Sequence model requires that the MiningSchema has a field with usageType of group. It groups the Itemsets into transaction groups. If a MiningField
labeled as order exists, it defines the chronology of the Itemsets within a transaction group. The dataType of its respective DataField specifies the measure unit for all times given
within the model. E.g., for dataType="dateDaysSince[1970]" the resulting time measure unit is days.
In case there is no MiningField with a usageType of order, it is assumed that all transactions took place in equidistant time. The measure unit for the times in the model is of no interest in
that case. Furthermore, times are represented by integer values and successive transactions within a group are spaced 1 time unit apart.
A sequence model consists of a number of major parts:
<xs:element name="SequenceModel">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="MiningSchema"/>
<xs:element ref="ModelStats" minOccurs="0"/>
<xs:element ref="LocalTransformations" minOccurs="0" />
<xs:element ref="Constraints" minOccurs="0"/>
<xs:element ref="Item" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="Itemset" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="SetPredicate" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="Sequence" maxOccurs="unbounded"/>
<xs:element ref="SequenceRule" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="modelName" type="xs:string"/>
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/>
<xs:attribute name="algorithmName" type="xs:string"/>
<xs:attribute name="numberOfTransactions" type="INT-NUMBER"/>
<xs:attribute name="maxNumberOfItemsPerTransaction" type="INT-NUMBER"/>
<xs:attribute name="avgNumberOfItemsPerTransaction" type="REAL-NUMBER"/>
<xs:attribute name="numberOfTransactionGroups" type="INT-NUMBER"/>
<xs:attribute name="maxNumberOfTAsPerTAGroup" type="INT-NUMBER"/>
<xs:attribute name="avgNumberOfTAsPerTAGroup" type="REAL-NUMBER"/>
</xs:complexType>
</xs:element>
|
Item is defined in the
Association Model.
Itemset is defined in the
Association Model.
SetPredicate is a set of predicates made up of simple boolean
expressions.
Sequence is an ordered collection of
Itemsets
or
SetPredicates. There will be at least one
Sequence.
SequenceRule describes the relationship between two
sequences.
Attribute descriptions for
SequenceModel:
numberOfTransactions : the number of
objects in the data the model was built on, e.g., unique customers or visitors.
maxNumberOfItemsPerTransaction : the maximum number of events (e.g., visits) per object.
avgNumberOfItemsPerTransaction : the average number of events that make up the object.
numberOfTransactionGroups : total number of transaction groups in the trainings data.
maxNumberOfTAPerTAGroup : maximum number of transactions for all transaction groups.
avgNumberOfTAPerTAGroup : average number of transactions for all transaction groups.
Note that these attributes are for information only.
Attributes in the element
Constraints represent global constraints
from the model build-phase that apply to all
Items and
Itemsets. If not present, then there were no constraints during the model build-phase.
<xs:element name="Constraints">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="minimumNumberOfItems" type="INT-NUMBER" default="1"/>
<xs:attribute name="maximumNumberOfItems" type="INT-NUMBER"/>
<xs:attribute name="minimumNumberOfAntecedentItems" type="INT-NUMBER" default="1"/>
<xs:attribute name="maximumNumberOfAntecedentItems" type="INT-NUMBER"/>
<xs:attribute name="minimumNumberOfConsequentItems" type="INT-NUMBER" default="1"/>
<xs:attribute name="maximumNumberOfConsequentItems" type="INT-NUMBER"/>
<xs:attribute name="minimumSupport" type="REAL-NUMBER" default="0"/>
<xs:attribute name="minimumConfidence" type="REAL-NUMBER" default="0"/>
<xs:attribute name="minimumLift" type="REAL-NUMBER" default="0"/>
<xs:attribute name="minimumTotalSequenceTime" type="REAL-NUMBER" default="0"/>
<xs:attribute name="maximumTotalSequenceTime" type="REAL-NUMBER"/>
<xs:attribute name="minimumItemsetSeparationTime" type="REAL-NUMBER" default="0"/>
<xs:attribute name="maximumItemsetSeparationTime" type="REAL-NUMBER"/>
<xs:attribute name="minimumAntConsSeparationTime" type="REAL-NUMBER" default="0"/>
<xs:attribute name="maximumAntConsSeparationTime" type="REAL-NUMBER"/>
</xs:complexType>
</xs:element>
|
Attribute description:
minimumNumberOfItems : the minimum number of Items in a
sequence.
maximumNumberOfItems : the maximum number of Items in a
sequence.
minimumNumberOfAntecedentItems : the minimum number of Items in a sequence's antecedent that was used during the model build-phase to filter the rules.
maximumNumberOfAntecedentItems : the maximum number of Items in a sequence's antecedent that was used during the model build-phase to filter the rules. If not present, then there was no limit.
minimumNumberOfConsequentItems : the minimum number of Items in a sequence's consequent that was used during the model build-phase to filter the rules.
maximumNumberOfConsequentItems : the maximum number of Items in a sequence's consequent that was used during the model build-phase to filter the rules. If not present, then there was no limit.
minimumSupport : the minimum support that was used during the model build-phase to filter the rules.
minimumConfidence : the minimum confidence that was used during the model build-phase to filter the rules.
minimumLift : the minimum lift that was used during the model build-phase to filter the rules.
minimumTotalSequenceTime : the minimum total elapsed time from beginning to the end of a sequence rule that was used during the model build-phase to filter the rules.
maximumTotalSequenceTime : the maximum total elapsed time from beginning to the end of a sequence rule that was used during the model build-phase to filter the rules. If not present then there was no limit.
minimumItemsetSeparationTime : the minimum time allowed between two Itemsets in a sequence rule's antecedent that was used during the model build-phase to filter the rules.
maximumItemsetSeparationTime : the maximum time allowed between two Itemsets in a sequence rule's antecedent that was used during the model build-phase to filter the rules. If not present then there was no limit.
minimumAntConsSeparationTime : minimum time between antecedent and consequent Sequence in a SequenceRule that was used during the model build-phase to filter the rules.
maximumAntConsSeparationTime : maximum time between antecedent and consequent Sequence in a SequenceRule that was used during the model build-phase to filter the rules. If not present then there was no limit.
SetPredicate
Note:
SetPredicate is deprecated as of PMML 3.1 and should not
be used anymore!
<xs:simpleType name="ELEMENT-ID">
<xs:restriction base="xs:string">
</xs:restriction>
</xs:simpleType>
<xs:element name="SetPredicate">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:group ref="STRING-ARRAY"/>
</xs:sequence>
<xs:attribute name="id" type="ELEMENT-ID" use="required"/>
<xs:attribute name="field" type="FIELD-NAME" use="required"/>
<xs:attribute name="operator" type="xs:string" fixed="supersetOf"/>
</xs:complexType>
</xs:element>
|
SetPredicate elements consist of a boolean
expression. This is made up of a field, a comparison operator, and a
value. The value(s) will be written in the form of an array.
Attribute description:
id
: An element ID uniquely identifying a predicate set.
(Referenced in Sequences by setId.)
field : The subject
of the predicate statement. Usually this name refers to one of the
DerivedField elements in the TransformationDictionary.
operator : The association between
the subject of the predicate statement and the array of values.
Note
that a SetPredicate compares two sets while a SimpleSetPredicte
(as defined in the tree model) checks membership of a single value in a
set.
Delimiter & Time
<xs:simpleType name="DELIMITER">
<xs:restriction base="xs:string">
<xs:enumeration value="sameTimeWindow"/>
<xs:enumeration value="acrossTimeWindows"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="GAP">
<xs:restriction base="xs:string">
<xs:enumeration value="true"/>
<xs:enumeration value="false"/>
<xs:enumeration value="unknown"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="Delimiter">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="delimiter" type="DELIMITER" use="required"/>
<xs:attribute name="gap" type="GAP" use="required"/>
</xs:complexType>
</xs:element>
|
Delimiter is the separation between two
Itemsets in a
Sequence, or between two
Sequences in a
SequenceRule.
Attribute description:
delimiter: states whether or not this Itemset or SetPredicate occurred within
the same event or time period, as defined by a time window as the previous one.
E.g., if items are purchased during the same visit, delimiter would be sameTimeWindow. If items are purchased in separate
visits, the value for delimiter would be acrossTimeWindow
gap : Indicates whether additional
Itemsets or SetPredicates can be present to match the respective sequence. true represents an open sequence,
which allows for gaps between sequences (as does unknown). In a closed sequence the gap is set to false, indicating
that the two Sets or Sequences being described are consecutive sets in the data.
E.g., if the sequence is A→B→C→D→E, then the sequence B→D would only match if gap is specified as
true or unknown.
<xs:element name="Time">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="min" type="NUMBER"/>
<xs:attribute name="max" type="NUMBER"/>
<xs:attribute name="mean" type="NUMBER"/>
<xs:attribute name="standardDeviation" type="NUMBER"/>
</xs:complexType>
</xs:element>
|
Time is only statistics for information, not implying any constraints.
The following attributes apply either to
Itemsets in a
Sequence or
AntecedentSequence and
ConsequentSequence:
min : the minimum time inbetween.
max : the maximum time inbetween.
mean : the mean time inbetween.
standardDeviation : the standard deviation
of the time inbetween.
Sequence
<xs:group name="FOLLOW-SET">
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="Delimiter"/>
<xs:element ref="Time" minOccurs="0"/>
<xs:element ref="SetReference"/>
</xs:sequence>
</xs:group>
<xs:element name="Sequence">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="SetReference"/>
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:group ref="FOLLOW-SET"/>
</xs:sequence>
<xs:element ref="Time" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="ELEMENT-ID" use="required"/>
<xs:attribute name="numberOfSets" type="INT-NUMBER"/>
<xs:attribute name="occurrence" type="INT-NUMBER"/>
<xs:attribute name="support" type="REAL-NUMBER"/>
</xs:complexType>
</xs:element>
|
Each Sequence mainly consists of a
SetReference.
The
Time element between
Delimiter and
SetReference gives statistics on the elapsed time between each
Itemset.
The
Time element after the final
SetReference gives statistics on the total elapsed time from the first to the last
Itemset in the
Sequence.
Attribute description:
id : the unique ID of this sequence. (Referenced in
SequenceRules by seqId).
numberOfSets : the
number of ItemSets or SetPredicates in this sequence.
occurrence : the number of objects in the data for which this
sequence holds true.
support : the ratio of the number
of objects in the data for which this sequence holds true, to the total
number of objects in the data.
<xs:element name="SetReference">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="setId" type="ELEMENT-ID" use="required"/>
</xs:complexType>
</xs:element>
|
The
SetReference refers (or points) to a previously
defined set. That set will be either a
SetPredicate
or an
Itemset (which will contain
ItemRef elements).
Attribute description:
setId : a pointer to the id of an
Itemset or SetPredicate.
Sequence Rules
<xs:element name="SequenceRule">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="AntecedentSequence"/>
<xs:element ref="Delimiter"/>
<xs:element ref="Time" minOccurs="0"/>
<xs:element ref="ConsequentSequence"/>
<xs:element ref="Time" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="ELEMENT-ID" use="required"/>
<xs:attribute name="numberOfSets" type="INT-NUMBER" use="required"/>
<xs:attribute name="occurrence" type="INT-NUMBER" use="required"/>
<xs:attribute name="support" type="REAL-NUMBER" use="required"/>
<xs:attribute name="confidence" type="REAL-NUMBER" use="required"/>
<xs:attribute name="lift" type="REAL-NUMBER"/>
</xs:complexType>
</xs:element>
|
A
Sequence Rule consists of an antecedent
Sequence
and a consequent
Sequence, separated by a
Delimiter.
The
Time element between
AntecedentSequence and
ConsequentSequence gives
statistics on the elapsed time between the antecedent and the consequence, while the
Time element
after
ConsequentSequence gives statistics on the total elapsed time from the first to the last
Itemset
in the sequence rule.
Attribute description:
id : the unique ID of this SequenceRule.
numberOfSets : the total number of sets in both
the antecedent and consequent Sequences.
occurrence
: the number of objects in the data for which the antecedent
and consequent Sequences hold true.
support : the ratio of the number of objects in the
data for which the antecedent and consequent Sequences
hold true, to the total number of objects in the data.
confidence : probability of the consequent following
the antecedent. Calculated as the number of occurrences
of a sequence rule divided by the number of occurrences of the
antecedent.
lift : ratio between the actual support of a SequenceRule and its
expected support. The expected support of a
SequenceRule A→C with nA antecedent item sets and nC consequent
item sets is support(A) * support(C)/ binomialCoefficient( nA+nC, nC ).
Note: Compared to the formula for the lift of an AssociationRule, there is an additional correction factor
binomialCoefficient( nA+nC, nC ). This factor accounts for the fact that
there are binomialCoefficient( nA+nC, nC ) different possibilities for the
time order in which the antecedent and the head sequence can be realized in a transaction group, and only one of
them contributes to the support of the sequence, namely the time order in which the first consequent item set occurs
after the last antecedent item set.
Antecedent & Consequent Sequences
<xs:group name="SEQUENCE">
<xs:sequence>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:element ref="SequenceReference"/>
<xs:element ref="Time" minOccurs="0"/>
</xs:sequence>
</xs:group>
<xs:element name="SequenceReference">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="seqId" type="ELEMENT-ID" use="required"/>
</xs:complexType>
</xs:element>
<xs:element name="AntecedentSequence">
<xs:complexType>
<xs:sequence>
<xs:group ref="SEQUENCE"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:element name="ConsequentSequence">
<xs:complexType>
<xs:sequence>
<xs:group ref="SEQUENCE"/>
</xs:sequence>
</xs:complexType>
</xs:element>
|
Attribute description:
seqId : a pointer to
the id attribute of a previously defined
Sequence.
Example
<?xml version="1.0"?>
<PMML version="4.0" xmlns="https://www.dmg.org/PMML-4_0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Header copyright="DMG.org"/>
<DataDictionary numberOfFields="5">
<DataField name="CUSTOMER_ID" displayName="CUSTOMER_ID" optype="categorical" dataType="integer"/>
<DataField name="TRANSDATE" displayName="TRANSDATE" optype="continuous" dataType="dateDaysSince[0]"/>
<DataField name="ITEMID" displayName="ITEMID" optype="categorical" dataType="string"/>
<DataField name="STOREID" displayName="STOREID" optype="categorical" dataType="string"/>
<DataField name="TRANSID" displayName="TRANSID" optype="categorical" dataType="string"/>
</DataDictionary>
<SequenceModel functionName="sequences" numberOfTransactions="175">
<MiningSchema>
<MiningField name="CUSTOMER_ID" usageType="group"/>
<MiningField name="TRANSDATE" usageType="order"/>
<MiningField name="ITEMID"/>
<MiningField name="STOREID"/>
<MiningField name="TRANSID"/>
</MiningSchema>
<Constraints minimumSupport="0.2" minimumConfidence="0.5"/>
<Item id="0" value="177" mappedValue="Cognac"/>
<Item id="1" value="129" mappedValue="Cream"/>
<Item id="2" value="144" mappedValue="Tonic water"/>
<Item id="3" value="174" mappedValue="Vodka"/>
<Item id="4" value="108" mappedValue="Cider"/>
<Item id="5" value="172" mappedValue="Scotch Whisky"/>
<Item id="6" value="130" mappedValue="Root Beer"/>
<Itemset id="0" support="0.0628571428571429" numberOfItems="1">
<ItemRef itemRef="0"/>
</Itemset>
<Itemset id="1" support="0.24" numberOfItems="2">
<ItemRef itemRef="1"/>
<ItemRef itemRef="2"/>
</Itemset>
<Itemset id="2" support="0.0628571428571429" numberOfItems="3">
<ItemRef itemRef="3"/>
<ItemRef itemRef="4"/>
<ItemRef itemRef="5"/>
</Itemset>
<Itemset id="3" support="0.0628571428571429" numberOfItems="1">
<ItemRef itemRef="6"/>
</Itemset>
<Sequence id="0" numberOfSets="1" occurrence="5" support="0.02">
<SetReference setId="0"/>
</Sequence>
<Sequence id="1" numberOfSets="2" occurrence="6" support="0.25">
<SetReference setId="0"/>
<Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
<SetReference setId="2"/>
</Sequence>
<Sequence id="2" numberOfSets="1" occurrence="5" support="0.45">
<SetReference setId="1"/>
</Sequence>
<Sequence id="3" numberOfSets="1" occurrence="15" support="0.2">
<SetReference setId="3"/>
</Sequence>
<SequenceRule id="0" numberOfSets="2" occurrence="5" support="0.20833" confidence="0.55556">
<AntecedentSequence>
<SequenceReference seqId="0"/>
</AntecedentSequence>
<Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
<Time min="5" max="8" mean="6.8"/>
<ConsequentSequence>
<SequenceReference seqId="2"/>
</ConsequentSequence>
</SequenceRule>
<SequenceRule id="1" numberOfSets="2" occurrence="6" support="0.25" confidence="0.66667">
<AntecedentSequence>
<SequenceReference seqId="1"/>
</AntecedentSequence>
<Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
<Time min="2" max="8" mean="6.16667"/>
<ConsequentSequence>
<SequenceReference seqId="3"/>
</ConsequentSequence>
</SequenceRule>
<SequenceRule id="2" numberOfSets="2" occurrence="5" support="0.20833" confidence="0.55556">
<AntecedentSequence>
<SequenceReference seqId="2"/>
</AntecedentSequence>
<Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
<Time min="2" max="8" mean="6.6"/>
<ConsequentSequence>
<SequenceReference seqId="3"/>
</ConsequentSequence>
</SequenceRule>
<SequenceRule id="3" numberOfSets="2" occurrence="14" support="0.58333" confidence="0.73684">
<AntecedentSequence>
<SequenceReference seqId="3"/>
</AntecedentSequence>
<Delimiter delimiter="acrossTimeWindows" gap="unknown"/>
<Time min="1" max="10" mean="6.14286"/>
<ConsequentSequence>
<SequenceReference seqId="0"/>
</ConsequentSequence>
</SequenceRule>
</SequenceModel>
</PMML>
|