Sequence Rules
PMML3.0 Menu

Home


PMML Notice and License

Changes


Conformance

General Structure

Header

Data
Dictionary


Mining
Schema


Transformations

Statistics

Taxomony

Targets

Output

Functions

Built-in Functions

Model Composition

Model Verification


Association Rules

Cluster
Models


General
Regression


Naive
Bayes


Neural
Network


Regression

Ruleset

Sequences

Text Models

Trees

Vector Machine

PMML 3.0 - Sequence Rules

The basic data model consists of a sequence object, identified by the "Primary Key" that has a number of events attributed to it, defined by the "Secondary Key". Each event consists of a set of ordered items. An "Order Field" defines the order of the items within an event, with an optional qualifier in the form of an attribute name.

SequenceModel

A Sequence mining model consists of a number of major parts:


  <xs:element name="SequenceModel">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="MiningSchema"/>
        <xs:element ref="ModelStats" minOccurs="0"/>
        <xs:element ref="LocalTransformations" minOccurs="0" />
        <xs:element ref="Item" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="Itemset" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="SetPredicate" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="Sequence" maxOccurs="unbounded"/>
        <xs:element ref="SequenceRule" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
      <xs:attribute name="modelName" type="xs:string"/>
      <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/>
      <xs:attribute name="algorithmName" type="xs:string"/>
      <xs:attribute name="numberOfTransactions" type="INT-NUMBER" use="required"/>
      <xs:attribute name="maxNumberOfItemsPerTransaction" type="INT-NUMBER"/>
      <xs:attribute name="avgNumberOfItemsPerTransaction" type="REAL-NUMBER"/>
      <xs:attribute name="minimumSupport" type="REAL-NUMBER" use="required"/>
      <xs:attribute name="minimumConfidence" type="REAL-NUMBER" use="required"/>
      <xs:attribute name="lengthLimit" type="INT-NUMBER"/>
      <xs:attribute name="numberOfItems" type="INT-NUMBER" use="required"/>
      <xs:attribute name="numberOfSets" type="INT-NUMBER" use="required"/>
      <xs:attribute name="numberOfSequences" type="INT-NUMBER" use="required"/>
      <xs:attribute name="numberOfRules" type="INT-NUMBER" use="required"/>
      <xs:attribute name="timeWindowWidth" type="INT-NUMBER"/>
      <xs:attribute name="minimumTime" type="INT-NUMBER"/>
      <xs:attribute name="maximumTime" type="INT-NUMBER"/>
    </xs:complexType>
  </xs:element> 

Extension provides the capability to extend the content of a model.
MiningSchema lists the fields that are used in this model. This is a subset of the fields as defined in the data dictionary and the transformation dictionary. The transformations in the transformation dictionary will have been carried out on one of the DataField values in the data dictionary, providing new fields for use in the model.
Item is defined in the Association Model.
Itemset is defined in the Association Model.
SetPredicate is a set of predicates made up of simple boolean expressions.
Sequence is an ordered collection of SetPredicates or Itemsets. There will be at least one Sequence.
SequenceRule describes the relationship between two sequences.

Attribute description:

numberOfTransactions : the number of objects in the data (e.g., unique customers or visitors).
maxNumberOfItemsPerTransaction : the maximum number of events (e.g., visits) per object.
avgNumberOfItemsPerTransaction : the average number of events that make up the object.
minimumSupport : the minimum support for a sequence to be discovered.
minimumConfidence : the minimum confidence for a rule to be discovered.
lengthLimit : the maximum length of a sequence to be discovered.
numberOfItems : total number of unique items (e.g., pages on a site).
numberOfSets : total number of sets.
numberOfSequences : total number of sequences discovered.
numberOfRules : total number of rules discovered.
timeWindowWidth : this may be used to separate items associated with an object into discrete events, but only if no clear key already exists for the separate events. Two consecutive items must have a time gap of less than this value to be considered as being part of the same event.
minimumTime : minimum time between items as defined above.
maximumTime : maximum time between items as defined above.

SetPredicate


  <xs:simpleType name="ELEMENT-ID">  
    <xs:restriction base="xs:string">
    </xs:restriction>
  </xs:simpleType>

  <xs:element name="SetPredicate">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:group ref="STRING-ARRAY"/>
      </xs:sequence>
      <xs:attribute name="id" type="ELEMENT-ID" use="required"/>
      <xs:attribute name="field" type="FIELD-NAME" use="required"/>
      <xs:attribute name="operator" type="xs:string" fixed="supersetOf"/>
    </xs:complexType>
  </xs:element>

SetPredicate elements consist of a boolean expression. This is made up of a field, a comparison operator, and a value. The value(s) will be written in the form of an array.

Attribute description:

id : An element ID uniquely identifying a predicate set. (Referenced in Sequences by setId.)
field : The subject of the predicate statement. Usually this name refers to one of the DerivedField elements in the TransformationDictionary.
operator : The association between the subject of the predicate statement and the array of values.
Note that a SetPredicate compares two sets while a SimpleSetPredicte (as defined in the tree model) checks membership of a single value in a set.

Delimiter & Time


  <xs:simpleType name="DELIMITER">
    <xs:restriction base="xs:string">
      <xs:enumeration value="sameTimeWindow"/>
      <xs:enumeration value="acrossTimeWindows"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="GAP">
    <xs:restriction base="xs:string">
      <xs:enumeration value="true"/>
      <xs:enumeration value="false"/>
      <xs:enumeration value="unknown"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:element name="Delimiter">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="delimiter" type="DELIMITER" use="required"/>
      <xs:attribute name="gap" type="GAP" use="required"/>
    </xs:complexType>
  </xs:element>

Delimiter is the separation between two Sets in a Sequence, or between two Sequences in a SequenceRule.

Attribute description:

delimiter : states whether or not this SetPredicate occurred within the same event or time period, as defined by a time window, (e.g., session) as the previous one.
gap : the possible existence of SetPredicates between this and the previous Set or Sequence. True represents an open sequence, which allows for gaps between sequences (as does unknown). In a closed sequence the gap is set to false, indicating that the two Sets or Sequences being described are consecutive Sets in the data.


  <xs:element name="Time">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="min" type="NUMBER" use="required"/>
      <xs:attribute name="max" type="NUMBER" use="required"/>
      <xs:attribute name="mean" type="NUMBER"/>
    </xs:complexType>
  </xs:element>

Attribute description:

min : the minimum time between Sets in a Sequence (or between an antecedent and consequent Sequence in a Rule).
max : the maximum time between Sets in a Sequence (or between an antecedent and consequent Sequence in a Rule).
mean : the mean time between Sets in a Sequence (or between an antecedent and consequent Sequence in a Rule).

Sequence


  <xs:group name="FOLLOW-SET">
    <xs:sequence>
      <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      <xs:element ref="Delimiter"/>
      <xs:element ref="SetReference"/>
    </xs:sequence>
  </xs:group>

  <xs:element name="Sequence">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:element ref="SetReference"/>
        <xs:sequence minOccurs="0" maxOccurs="unbounded">
          <xs:group ref="FOLLOW-SET"/>
        </xs:sequence>
      </xs:sequence>
      <xs:attribute name="id" type="ELEMENT-ID" use="required"/>
      <xs:attribute name="numberOfSets" type="INT-NUMBER"/>
      <xs:attribute name="occurrence" type="INT-NUMBER"/>
      <xs:attribute name="support" type="REAL-NUMBER"/>
    </xs:complexType>
  </xs:element>


Each Sequence consists of a SetReference and optional FOLLOW_SET(s). A FOLLOW_SET is another SetReference preceded by a delimiter.

Attribute description:

id : the unique ID of this sequence. (Referenced in SequenceRules by seqId).
numberOfSets : the number of SetPredicates and/or ItemSets in this sequence.
occurrence : the number of objects in the data for which this sequence holds true.
support : the ratio of the number of objects in the data for which this sequence holds true, to the total number of objects in the data.


  <xs:element name="SetReference">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="setId" type="ELEMENT-ID" use="required"/>
    </xs:complexType>
  </xs:element>

The SetReference refers (or points) to a previously defined set. That set will be either a SetPredicate or an Itemset (which will contain ItemRef elements).

Attribute description:

setId : a pointer to the id attribute of a SetPredicate or Itemset.


Sequence Rules


  <xs:element name="SequenceRule">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
        <xs:element ref="AntecedentSequence"/>
        <xs:element ref="Delimiter"/>
        <xs:element ref="Time" minOccurs="0" maxOccurs="unbounded"/>
        <xs:element ref="ConsequentSequence"/>
      </xs:sequence>
      <xs:attribute name="id" type="ELEMENT-ID" use="required"/>
      <xs:attribute name="numberOfSets" type="INT-NUMBER" use="required"/>
      <xs:attribute name="occurrence" type="INT-NUMBER" use="required"/>
      <xs:attribute name="support" type="REAL-NUMBER" use="required"/>
      <xs:attribute name="confidence" type="REAL-NUMBER" use="required"/>
    </xs:complexType>
  </xs:element>


A Sequence Rule consists of an antecedent Sequence and a consequent Sequence, separated by a delimiter and, possibly, time.

Attribute description:

id : the unique ID of this sequence rule.
numberOfSets : the total number of sets in both the antecedent and consequent Sequences.
occurrence : the number of objects in the data for which the antecedent and consequent Sequences hold true.
support : the ratio of the number of objects in the data for which the antecedent and consequent Sequences hold true, to the total number of objects in the data.
confidence : probability of the consequent following the antecedent. Calculated as the number of occurrences of a sequence divided by the number of occurrences of the antecedent.


Antecedent & Consequent Sequences


  <xs:group name="SEQUENCE">
    <xs:sequence>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:element ref="SequenceReference"/>
      <xs:element ref="Time" minOccurs="0" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:group>

  <xs:element name="SequenceReference">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded" />
      </xs:sequence>
      <xs:attribute name="seqId" type="ELEMENT-ID" use="required"/>
    </xs:complexType>
  </xs:element>

  <xs:element name="AntecedentSequence">
    <xs:complexType>
      <xs:sequence>
        <xs:group ref="SEQUENCE"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

  <xs:element name="ConsequentSequence">
    <xs:complexType>
      <xs:sequence>
        <xs:group ref="SEQUENCE"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>

Attribute description:

seqId : a pointer to the id attribute of a previously defined sequence.


Example

The example below represents the following scenario:

Visitors that come from  { index.html } will do the following (with a confidence of 0.25):
Visit  { offer.html, kdnuggets.com }  in the same visit, and without visiting another site;
Within 2 days they will return to  { products.html }  without visiting other sites between;
Visit  { basket.html } visiting at least one site beforehand;
Finally, go directly to  { checkout.html } .


  <?xml version="1.0" ?>
  <PMML version="3.0">
    <Header copyright="DMG.org" description="example model for sequences"/>
    <DataDictionary numberOfFields="4">
      <DataField name="visitor" optype="categorical"/>
      <DataField name="visit" optype="categorical"/>
      <DataField name="time" optype="categorical"/>
      <DataField name="page" optype="categorical"/>
    </DataDictionary>
   
    <TransformationDictionary>
      <DerivedField name="transaction">
        <Aggregate field="page" function="multiset" groupField="visit"/>
      </DerivedField>
    </TransformationDictionary>
   
    <SequenceModel functionName="sequences"
        numberOfTransactions="100" minimumSupport="0.20"
        minimumConfidence="0.25" numberOfItems="6"
        numberOfSets="5" numberOfSequences="3" numberOfRules="1">
    
      <MiningSchema>
        <MiningField name="visitor" usageType="order"/>
        <MiningField name="visit" usageType="group"/>
        <MiningField name="time" usageType="active">
          <Extension name="unit" value="days"/>
        </MiningField>
        <MiningField name="page" usageType="active"/>
      </MiningSchema>
   
      <!-- ========== Predicates ========== -->
      <SetPredicate id="sp001" field="transaction" operator="supersetOf">
        <Array n="1" type="string"> index.html </Array>
      </SetPredicate>
    
      <SetPredicate id="sp002" field="transaction" operator="supersetOf">
        <Array n="2" type="string"> offer.html kdnuggets.com </Array>
      </SetPredicate>
    
      <SetPredicate id="sp003" field="transaction" operator="supersetOf">
        <Array n="1" type="string"> products.html </Array>
      </SetPredicate>
    
      <SetPredicate id="sp004" field="transaction" operator="supersetOf">
        <Array n="1" type="string"> basket.html </Array>
      </SetPredicate>
    
      <SetPredicate id="sp005" field="transaction" operator="supersetOf">
        <Array n="1" type="string"> checkout.html </Array>
      </SetPredicate>
  
      <!-- ========== Sequences ========== -->
      <Sequence id="seq001" numberOfSets="1" occurrence="80" support="0.80">
        <SetReference setId="sp001"/>
      </Sequence>
    
      <Sequence id="seq002" numberOfSets="4" occurrence="40" support="0.40">
        <SetReference setId="sp002"/>
        <Delimiter delimiter="acrossTimeWindows" gap="false"/>
        <SetReference setId="sp003"/>
        <Delimiter delimiter="sameTimeWindow" gap="true"/>
        <SetReference setId="sp004"/>
        <Delimiter delimiter="sameTimeWindow" gap="false"/>
        <SetReference setId="sp005"/>
      </Sequence>
    
      <Sequence id="seq003" numberOfSets="5" occurrence="20" support="0.20">
        <SetReference setId="sp001"/>
        <Delimiter delimiter="sameTimeWindow" gap="unknown"/>
        <SetReference setId="sp002"/>
        <Delimiter delimiter="acrossTimeWindows" gap="false"/>
        <SetReference setId="sp003"/>
        <Delimiter delimiter="sameTimeWindow" gap="true"/>
        <SetReference setId="sp004"/>
        <Delimiter delimiter="sameTimeWindow" gap="false"/>
        <SetReference setId="sp005"/>
      </Sequence>
    
      <!-- ========== SequenceRules ========== -->
      <SequenceRule id="rule001" numberOfSets="5" occurrence="20" 
            support="0.20" confidence="0.25">
        <Extension name="qWeight" value="0.5"/>
        <Extension name="attrWeight" value="0.5"/>
        <Extension name="seqWeight" value="0.5"/>
    
        <AntecedentSequence>
          <SequenceReference seqId="seq001"/>
        </AntecedentSequence>
   
        <Delimiter delimiter="sameTimeWindow" gap="unknown"/>
        <Time min="0" max="0"/>
    
        <ConsequentSequence>
          <SequenceReference seqId="seq002"/>
          <Time min="0" max="2"/> 
              <!-- time between "sp002" and "sp003" in sequence "seq002" -->
          <Time min="0" max="0"/>
              <!-- time between "sp003" and "sp004" in sequence "seq002" -->
          <Time min="0" max="0"/>
              <!-- time between "sp004" and "sp005" in sequence "seq002" -->
        </ConsequentSequence>
    
      </SequenceRule>
   
    </SequenceModel>
  </PMML>

e-mail info at dmg.org