Sequence Rules
 PMML3.2 Menu Home PMML Notice and License Changes Conformance Interoperability General Structure Header Data Dictionary Mining Schema Transformations Statistics Taxomony Targets Output Functions Built-in Functions Model Composition Model Verification Association Rules Cluster Models General Regression Naive Bayes Neural Network Regression Ruleset Sequences Text Models Trees Vector Machine

## PMML 3.2 - Sequence Rules

A Sequence Rule model represents rules for various sets or items. For instance, a rule can express that after purchasing products A and B, customers tend to buy product C sooner or later as well.
Sequences are defined by Itemsets which in turn can contain one or more Items. SequenceRules define the relationship between Sequences. In addition, constraints regarding the time between the appearance of Itemsets or Sequences can be specified.

### SequenceModel

A Sequence mining model requires that the MiningSchema has a field with usageType of group. It groups the Itemsets into transaction groups. If a MiningField labeled as order exists, it defines the chronology of the Itemsets within a transaction group. The dataType of its respective DataField specifies the measure unit for all times given within the model. E.g., for dataType="dateDaysSince[1970]" the resulting time measure unit is days.
In case there is no MiningField with a usageType of order, it is assumed that all transactions took place in equidistant time. The measure unit for the times in the model is of no interest in that case. Furthermore, times are represented by integer values and successive transactions within a group are spaced 1 time unit apart.
A sequence model consists of a number of major parts:

 ``` ```

Item is defined in the Association Model.
Itemset is defined in the Association Model.
SetPredicate is a set of predicates made up of simple boolean expressions.
Sequence is an ordered collection of Itemsets or SetPredicates. There will be at least one Sequence.
SequenceRule describes the relationship between two sequences.

Attribute descriptions for SequenceModel:
numberOfTransactions : the number of objects in the data the model was built on, e.g., unique customers or visitors.
maxNumberOfItemsPerTransaction : the maximum number of events (e.g., visits) per object.
avgNumberOfItemsPerTransaction : the average number of events that make up the object.
numberOfTransactionGroups : total number of transaction groups in the trainings data.
maxNumberOfTAPerTAGroup : maximum number of transactions for all transaction groups.
avgNumberOfTAPerTAGroup : average number of transactions for all transaction groups.
Note that these attributes are for information only.

Attributes in the element Constraints represent global constraints from the model build-phase that apply to all Items and Itemsets. If not present, then there were no constraints during the model build-phase.
 ``` ```
Attribute description:
minimumNumberOfItems : the minimum number of Items in a sequence.
maximumNumberOfItems : the maximum number of Items in a sequence.
minimumNumberOfAntecedentItems : the minimum number of Items in a sequence's antecedent that was used during the model build-phase to filter the rules.
maximumNumberOfAntecedentItems : the maximum number of Items in a sequence's antecedent that was used during the model build-phase to filter the rules. If not present, then there was no limit.
minimumNumberOfConsequentItems : the minimum number of Items in a sequence's consequent that was used during the model build-phase to filter the rules.
maximumNumberOfConsequentItems : the maximum number of Items in a sequence's consequent that was used during the model build-phase to filter the rules. If not present, then there was no limit.
minimumSupport : the minimum support that was used during the model build-phase to filter the rules.
minimumConfidence : the minimum confidence that was used during the model build-phase to filter the rules.
minimumLift : the minimum lift that was used during the model build-phase to filter the rules.
minimumTotalSequenceTime : the minimum total elapsed time from beginning to the end of a sequence rule that was used during the model build-phase to filter the rules.
maximumTotalSequenceTime : the maximum total elapsed time from beginning to the end of a sequence rule that was used during the model build-phase to filter the rules. If not present then there was no limit.
minimumItemsetSeparationTime : the minimum time allowed between two Itemsets in a sequence rule's antecedent that was used during the model build-phase to filter the rules.
maximumItemsetSeparationTime : the maximum time allowed between two Itemsets in a sequence rule's antecedent that was used during the model build-phase to filter the rules. If not present then there was no limit.
minimumAntConsSeparationTime : minimum time between antecedent and consequent Sequence in a SequenceRule that was used during the model build-phase to filter the rules.
maximumAntConsSeparationTime : maximum time between antecedent and consequent Sequence in a SequenceRule that was used during the model build-phase to filter the rules. If not present then there was no limit.

### SetPredicate

Note: SetPredicate is deprecated as of PMML 3.1 and should not be used anymore!
 ``` ```
SetPredicate elements consist of a boolean expression. This is made up of a field, a comparison operator, and a value. The value(s) will be written in the form of an array.

Attribute description:

id : An element ID uniquely identifying a predicate set. (Referenced in Sequences by setId.)
field : The subject of the predicate statement. Usually this name refers to one of the DerivedField elements in the TransformationDictionary.
operator : The association between the subject of the predicate statement and the array of values.
Note that a SetPredicate compares two sets while a SimpleSetPredicte (as defined in the tree model) checks membership of a single value in a set.

### Delimiter & Time

 ``` ```
Delimiter is the separation between two Itemsets in a Sequence, or between two Sequences in a SequenceRule.

Attribute description:

delimiter: states whether or not this Itemset or SetPredicate occurred within the same event or time period, as defined by a time window as the previous one. E.g., if items are purchased during the same visit, delimiter would be sameTimeWindow. If items are purchased in separate visits, the value for delimiter would be acrossTimeWindow
gap : Indicates whether additional Itemsets or SetPredicates can be present to match the respective sequence. true represents an open sequence, which allows for gaps between sequences (as does unknown). In a closed sequence the gap is set to false, indicating that the two Sets or Sequences being described are consecutive sets in the data. E.g., if the sequence is A→B→C→D→E, then the sequence B→D would only match if gap is specified as true or unknown.
 ``` ```
Time is only statistics for information, not implying any constraints.
The following attributes apply either to Itemsets in a Sequence or AntecedentSequence and ConsequentSequence:
min : the minimum time inbetween.
max : the maximum time inbetween.
mean : the mean time inbetween.
standardDeviation : the standard deviation of the time inbetween.

### Sequence

 ``` ```

Each Sequence mainly consists of a SetReference.
The Time element between Delimiter and SetReference gives statistics on the elapsed time between each Itemset.
The Time element after the final SetReference gives statistics on the total elapsed time from the first to the last Itemset in the Sequence.

Attribute description:
id : the unique ID of this sequence. (Referenced in SequenceRules by seqId).
numberOfSets : the number of ItemSets or SetPredicates in this sequence.
occurrence : the number of objects in the data for which this sequence holds true.
support : the ratio of the number of objects in the data for which this sequence holds true, to the total number of objects in the data.
 ``` ```
The SetReference refers (or points) to a previously defined set. That set will be either a SetPredicate or an Itemset (which will contain ItemRef elements).

Attribute description:
setId : a pointer to the id of an Itemset or SetPredicate.

### Sequence Rules

 ``` ```

A Sequence Rule consists of an antecedent Sequence and a consequent Sequence, separated by a Delimiter. The Time element between AntecedentSequence and ConsequentSequence gives statistics on the elapsed time between the antecedent and the consequence, while the Time element after ConsequentSequence gives statistics on the total elapsed time from the first to the last Itemset in the sequence rule.

Attribute description:
id : the unique ID of this SequenceRule.
numberOfSets : the total number of sets in both the antecedent and consequent Sequences.
occurrence : the number of objects in the data for which the antecedent and consequent Sequences hold true.
support : the ratio of the number of objects in the data for which the antecedent and consequent Sequences hold true, to the total number of objects in the data.
confidence : probability of the consequent following the antecedent. Calculated as the number of occurrences of a sequence rule divided by the number of occurrences of the antecedent.
lift : ratio between the actual support of a SequenceRule and its expected support. The expected support of a SequenceRule A→C with nA antecedent item sets and nC consequent item sets is support(A) * support(C)/ binomialCoefficient( nA+nC, nC ).
Note: Compared to the formula for the lift of an AssociationRule, there is an additional correction factor binomialCoefficient( nA+nC, nC ). This factor accounts for the fact that there are binomialCoefficient( nA+nC, nC ) different possibilities for the time order in which the antecedent and the head sequence can be realized in a transaction group, and only one of them contributes to the support of the sequence, namely the time order in which the first consequent item set occurs after the last antecedent item set.

### Antecedent & Consequent Sequences

 ``` ```
Attribute description:
seqId : a pointer to the id attribute of a previously defined Sequence.

## Example

 ```
```
 e-mail info at dmg.org