PMML 3.0 - Sequence Rules
The basic data model consists of a sequence object, identified by the "Primary Key" that has a number of events attributed to it, defined by the "Secondary Key". Each event consists of a set of ordered items. An "Order Field" defines the order of the items within an event, with an optional qualifier in the form of an attribute name.
SequenceModel
A Sequence mining model consists of a number of major parts:
<xs:element name="SequenceModel">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="MiningSchema"/>
<xs:element ref="ModelStats" minOccurs="0"/>
<xs:element ref="LocalTransformations" minOccurs="0" />
<xs:element ref="Item" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="Itemset" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="SetPredicate" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="Sequence" maxOccurs="unbounded"/>
<xs:element ref="SequenceRule" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="modelName" type="xs:string"/>
<xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/>
<xs:attribute name="algorithmName" type="xs:string"/>
<xs:attribute name="numberOfTransactions" type="INT-NUMBER" use="required"/>
<xs:attribute name="maxNumberOfItemsPerTransaction" type="INT-NUMBER"/>
<xs:attribute name="avgNumberOfItemsPerTransaction" type="REAL-NUMBER"/>
<xs:attribute name="minimumSupport" type="REAL-NUMBER" use="required"/>
<xs:attribute name="minimumConfidence" type="REAL-NUMBER" use="required"/>
<xs:attribute name="lengthLimit" type="INT-NUMBER"/>
<xs:attribute name="numberOfItems" type="INT-NUMBER" use="required"/>
<xs:attribute name="numberOfSets" type="INT-NUMBER" use="required"/>
<xs:attribute name="numberOfSequences" type="INT-NUMBER" use="required"/>
<xs:attribute name="numberOfRules" type="INT-NUMBER" use="required"/>
<xs:attribute name="timeWindowWidth" type="INT-NUMBER"/>
<xs:attribute name="minimumTime" type="INT-NUMBER"/>
<xs:attribute name="maximumTime" type="INT-NUMBER"/>
</xs:complexType>
</xs:element>
Extension provides the capability to extend the content of a model.
MiningSchema lists the fields that are used in this model.
This is a subset of the fields as defined in the data dictionary and the
transformation dictionary. The transformations in the transformation
dictionary will have been carried out on one of the DataField
values in the data dictionary, providing new fields for use in the
model.
Item is defined in the Association Model.
Itemset is defined in the Association Model.
SetPredicate is a set of predicates made up of simple boolean
expressions.
Sequence is an ordered collection of SetPredicates
or Itemsets. There will be at least one Sequence.
SequenceRule describes the relationship between two
sequences.
Attribute description:
numberOfTransactions : the number of objects in the data (e.g., unique customers or visitors).
maxNumberOfItemsPerTransaction : the maximum number of events (e.g., visits) per object.
avgNumberOfItemsPerTransaction : the average number of events that make up the object.
minimumSupport : the minimum support for a sequence to be discovered.
minimumConfidence : the minimum confidence for a rule to be discovered.
lengthLimit : the maximum length of a sequence to be discovered.
numberOfItems : total number of unique items (e.g., pages on a site).
numberOfSets : total number of sets.
numberOfSequences : total number of sequences discovered.
numberOfRules : total number of rules discovered.
timeWindowWidth : this may be used to separate items associated with an object into discrete events, but only if no clear key already exists for the separate events. Two consecutive items must have a time gap of less than this value to be considered as being part of the same event.
minimumTime : minimum time between items as defined above.
maximumTime : maximum time between items as defined above.
SetPredicate
<xs:simpleType name="ELEMENT-ID">
<xs:restriction base="xs:string">
</xs:restriction>
</xs:simpleType>
<xs:element name="SetPredicate">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:group ref="STRING-ARRAY"/>
</xs:sequence>
<xs:attribute name="id" type="ELEMENT-ID" use="required"/>
<xs:attribute name="field" type="FIELD-NAME" use="required"/>
<xs:attribute name="operator" type="xs:string" fixed="supersetOf"/>
</xs:complexType>
</xs:element>
Attribute description:
id : An element ID uniquely identifying a predicate set. (Referenced in Sequences by setId.)
field : The subject of the predicate statement. Usually this name refers to one of the DerivedField elements in the TransformationDictionary.
operator : The association between the subject of the predicate statement and the array of values.
Note that a SetPredicate compares two sets while a SimpleSetPredicte (as defined in the tree model) checks membership of a single value in a set.
Delimiter & Time
<xs:simpleType name="DELIMITER">
<xs:restriction base="xs:string">
<xs:enumeration value="sameTimeWindow"/>
<xs:enumeration value="acrossTimeWindows"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="GAP">
<xs:restriction base="xs:string">
<xs:enumeration value="true"/>
<xs:enumeration value="false"/>
<xs:enumeration value="unknown"/>
</xs:restriction>
</xs:simpleType>
<xs:element name="Delimiter">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="delimiter" type="DELIMITER" use="required"/>
<xs:attribute name="gap" type="GAP" use="required"/>
</xs:complexType>
</xs:element>
Attribute description:
delimiter : states whether or not this SetPredicate occurred within the same event or time period, as defined by a time window, (e.g., session) as the previous one.
gap : the possible existence of SetPredicates between this and the previous Set or Sequence. True represents an open sequence, which allows for gaps between sequences (as does unknown). In a closed sequence the gap is set to false, indicating that the two Sets or Sequences being described are consecutive Sets in the data.
<xs:element name="Time">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="min" type="NUMBER" use="required"/>
<xs:attribute name="max" type="NUMBER" use="required"/>
<xs:attribute name="mean" type="NUMBER"/>
</xs:complexType>
</xs:element>
min : the minimum time between Sets in a Sequence (or between an antecedent and consequent Sequence in a Rule).
max : the maximum time between Sets in a Sequence (or between an antecedent and consequent Sequence in a Rule).
mean : the mean time between Sets in a Sequence (or between an antecedent and consequent Sequence in a Rule).
Sequence
<xs:group name="FOLLOW-SET"> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="Delimiter"/> <xs:element ref="SetReference"/> </xs:sequence> </xs:group> <xs:element name="Sequence"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="SetReference"/> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:group ref="FOLLOW-SET"/> </xs:sequence> </xs:sequence> <xs:attribute name="id" type="ELEMENT-ID" use="required"/> <xs:attribute name="numberOfSets" type="INT-NUMBER"/> <xs:attribute name="occurrence" type="INT-NUMBER"/> <xs:attribute name="support" type="REAL-NUMBER"/> </xs:complexType> </xs:element> |
Each Sequence consists of a SetReference
and optional FOLLOW_SET(s). A FOLLOW_SET is
another SetReference preceded by a delimiter.
Attribute
description:
id : the unique ID of this sequence. (Referenced in SequenceRules by seqId).
numberOfSets : the number of SetPredicates and/or ItemSets in this sequence.
occurrence : the number of objects in the data for which this sequence holds true.
support : the ratio of the number of objects in the data for which this sequence holds true, to the total number of objects in the data.
<xs:element name="SetReference">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
<xs:attribute name="setId" type="ELEMENT-ID" use="required"/>
</xs:complexType>
</xs:element>
The SetReference refers (or points) to a previously
defined set. That set will be either a SetPredicate
or an Itemset (which will contain ItemRef elements).
Attribute description:
setId : a pointer to the id attribute of a SetPredicate or Itemset.
Sequence Rules
<xs:element name="SequenceRule">
<xs:complexType>
<xs:sequence>
<xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
<xs:element ref="AntecedentSequence"/>
<xs:element ref="Delimiter"/>
<xs:element ref="Time" minOccurs="0" maxOccurs="unbounded"/>
<xs:element ref="ConsequentSequence"/>
</xs:sequence>
<xs:attribute name="id" type="ELEMENT-ID" use="required"/>
<xs:attribute name="numberOfSets" type="INT-NUMBER" use="required"/>
<xs:attribute name="occurrence" type="INT-NUMBER" use="required"/>
<xs:attribute name="support" type="REAL-NUMBER" use="required"/>
<xs:attribute name="confidence" type="REAL-NUMBER" use="required"/>
</xs:complexType>
</xs:element>
A
Sequence Rule consists of an antecedent Sequence
and a consequent Sequence, separated by a delimiter and, possibly, time.
Attribute description:
id : the unique ID of this sequence rule.
numberOfSets : the total number of sets in both the antecedent and consequent Sequences.
occurrence : the number of objects in the data for which the antecedent and consequent Sequences hold true.
support : the ratio of the number of objects in the data for which the antecedent and consequent Sequences hold true, to the total number of objects in the data.
confidence : probability of the consequent following the antecedent. Calculated as the number of occurrences of a sequence divided by the number of occurrences of the antecedent.
Antecedent & Consequent Sequences
<xs:group name="SEQUENCE"> <xs:sequence> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:element ref="SequenceReference"/> <xs:element ref="Time" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:group> <xs:element name="SequenceReference"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> <xs:attribute name="seqId" type="ELEMENT-ID" use="required"/> </xs:complexType> </xs:element> <xs:element name="AntecedentSequence"> <xs:complexType> <xs:sequence> <xs:group ref="SEQUENCE"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="ConsequentSequence"> <xs:complexType> <xs:sequence> <xs:group ref="SEQUENCE"/> </xs:sequence> </xs:complexType> </xs:element> |
seqId : a pointer to the id attribute of a previously defined sequence.
Example
The example below represents the following scenario:
Visitors that come from {
index.html } will do the following (with a confidence of 0.25):
Visit { offer.html,
kdnuggets.com } in the same visit, and without visiting
another site;
Within 2 days they will return to { products.html } without visiting
other sites between;
Visit {
basket.html } visiting at least one site beforehand;
Finally, go directly to {
checkout.html } .
<?xml version="1.0" ?>
<PMML version="3.0">
<Header copyright="DMG.org" description="example model for sequences"/>
<DataDictionary numberOfFields="4">
<DataField name="visitor" optype="categorical"/>
<DataField name="visit" optype="categorical"/>
<DataField name="time" optype="categorical"/>
<DataField name="page" optype="categorical"/>
</DataDictionary>
<TransformationDictionary>
<DerivedField name="transaction">
<Aggregate field="page" function="multiset" groupField="visit"/>
</DerivedField>
</TransformationDictionary>
<SequenceModel functionName="sequences"
numberOfTransactions="100" minimumSupport="0.20"
minimumConfidence="0.25" numberOfItems="6"
numberOfSets="5" numberOfSequences="3" numberOfRules="1">
<MiningSchema>
<MiningField name="visitor" usageType="order"/>
<MiningField name="visit" usageType="group"/>
<MiningField name="time" usageType="active">
<Extension name="unit" value="days"/>
</MiningField>
<MiningField name="page" usageType="active"/>
</MiningSchema>
<!-- ========== Predicates ========== -->
<SetPredicate id="sp001" field="transaction" operator="supersetOf">
<Array n="1" type="string"> index.html </Array>
</SetPredicate>
<SetPredicate id="sp002" field="transaction" operator="supersetOf">
<Array n="2" type="string"> offer.html kdnuggets.com </Array>
</SetPredicate>
<SetPredicate id="sp003" field="transaction" operator="supersetOf">
<Array n="1" type="string"> products.html </Array>
</SetPredicate>
<SetPredicate id="sp004" field="transaction" operator="supersetOf">
<Array n="1" type="string"> basket.html </Array>
</SetPredicate>
<SetPredicate id="sp005" field="transaction" operator="supersetOf">
<Array n="1" type="string"> checkout.html </Array>
</SetPredicate>
<!-- ========== Sequences ========== -->
<Sequence id="seq001" numberOfSets="1" occurrence="80" support="0.80">
<SetReference setId="sp001"/>
</Sequence>
<Sequence id="seq002" numberOfSets="4" occurrence="40" support="0.40">
<SetReference setId="sp002"/>
<Delimiter delimiter="acrossTimeWindows" gap="false"/>
<SetReference setId="sp003"/>
<Delimiter delimiter="sameTimeWindow" gap="true"/>
<SetReference setId="sp004"/>
<Delimiter delimiter="sameTimeWindow" gap="false"/>
<SetReference setId="sp005"/>
</Sequence>
<Sequence id="seq003" numberOfSets="5" occurrence="20" support="0.20">
<SetReference setId="sp001"/>
<Delimiter delimiter="sameTimeWindow" gap="unknown"/>
<SetReference setId="sp002"/>
<Delimiter delimiter="acrossTimeWindows" gap="false"/>
<SetReference setId="sp003"/>
<Delimiter delimiter="sameTimeWindow" gap="true"/>
<SetReference setId="sp004"/>
<Delimiter delimiter="sameTimeWindow" gap="false"/>
<SetReference setId="sp005"/>
</Sequence>
<!-- ========== SequenceRules ========== -->
<SequenceRule id="rule001" numberOfSets="5" occurrence="20"
support="0.20" confidence="0.25">
<Extension name="qWeight" value="0.5"/>
<Extension name="attrWeight" value="0.5"/>
<Extension name="seqWeight" value="0.5"/>
<AntecedentSequence>
<SequenceReference seqId="seq001"/>
</AntecedentSequence>
<Delimiter delimiter="sameTimeWindow" gap="unknown"/>
<Time min="0" max="0"/>
<ConsequentSequence>
<SequenceReference seqId="seq002"/>
<Time min="0" max="2"/>
<!-- time between "sp002" and "sp003" in sequence "seq002" -->
<Time min="0" max="0"/>
<!-- time between "sp003" and "sp004" in sequence "seq002" -->
<Time min="0" max="0"/>
<!-- time between "sp004" and "sp005" in sequence "seq002" -->
</ConsequentSequence>
</SequenceRule>
</SequenceModel>
</PMML>