|
||||||||||||||
|
||||||||||||||
| ||||||||||||||
PMML 4.4.1 - Sequence RulesA Sequence Rule model represents rules for various sets or items. For
instance, a rule can express that after purchasing products A and
B, customers tend to buy product C sooner or later as
well. SequenceModelA Sequence model requires that the MiningSchema has a field with
usageType of group. It groups the Itemsets into
transaction groups. If a MiningField labeled as order
exists, it defines the chronology of the Itemsets within a
transaction group. The dataType of its respective DataField
specifies the measure unit for all times given within the model. E.g., for
dataType="dateDaysSince[1970]" the resulting time measure unit is
days. <xs:element name="SequenceModel"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="MiningSchema"/> <xs:element ref="ModelStats" minOccurs="0"/> <xs:element ref="LocalTransformations" minOccurs="0"/> <xs:element ref="Constraints" minOccurs="0"/> <xs:element ref="Item" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Itemset" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="SetPredicate" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Sequence" maxOccurs="unbounded"/> <xs:element ref="SequenceRule" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="modelName" type="xs:string"/> <xs:attribute name="functionName" type="MINING-FUNCTION" use="required"/> <xs:attribute name="algorithmName" type="xs:string"/> <xs:attribute name="numberOfTransactions" type="xs:nonNegativeInteger"/> <xs:attribute name="maxNumberOfItemsPerTransaction" type="xs:nonNegativeInteger"/> <xs:attribute name="avgNumberOfItemsPerTransaction" type="REAL-NUMBER"/> <xs:attribute name="numberOfTransactionGroups" type="xs:nonNegativeInteger"/> <xs:attribute name="maxNumberOfTAsPerTAGroup" type="xs:nonNegativeInteger"/> <xs:attribute name="avgNumberOfTAsPerTAGroup" type="REAL-NUMBER"/> <xs:attribute name="isScorable" type="xs:boolean" default="true"/> </xs:complexType> </xs:element>
Attribute descriptions for SequenceModel:
Note that these attributes are for information only (except isScorable). Attributes in the element Constraints represent global constraints from the model build-phase that apply to all Items and Itemsets. If not present, then there were no constraints during the model build-phase. <xs:element name="Constraints"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="minimumNumberOfItems" type="xs:nonNegativeInteger" default="1"/> <xs:attribute name="maximumNumberOfItems" type="xs:nonNegativeInteger"/> <xs:attribute name="minimumNumberOfAntecedentItems" type="xs:nonNegativeInteger" default="1"/> <xs:attribute name="maximumNumberOfAntecedentItems" type="xs:nonNegativeInteger"/> <xs:attribute name="minimumNumberOfConsequentItems" type="xs:nonNegativeInteger" default="1"/> <xs:attribute name="maximumNumberOfConsequentItems" type="xs:nonNegativeInteger"/> <xs:attribute name="minimumSupport" type="REAL-NUMBER" default="0"/> <xs:attribute name="minimumConfidence" type="REAL-NUMBER" default="0"/> <xs:attribute name="minimumLift" type="REAL-NUMBER" default="0"/> <xs:attribute name="minimumTotalSequenceTime" type="REAL-NUMBER" default="0"/> <xs:attribute name="maximumTotalSequenceTime" type="REAL-NUMBER"/> <xs:attribute name="minimumItemsetSeparationTime" type="REAL-NUMBER" default="0"/> <xs:attribute name="maximumItemsetSeparationTime" type="REAL-NUMBER"/> <xs:attribute name="minimumAntConsSeparationTime" type="REAL-NUMBER" default="0"/> <xs:attribute name="maximumAntConsSeparationTime" type="REAL-NUMBER"/> </xs:complexType> </xs:element> Attribute description:
SetPredicateNote: SetPredicate is deprecated as of PMML 3.1 and should not be used anymore!<xs:simpleType name="ELEMENT-ID"> <xs:restriction base="xs:string"> </xs:restriction> </xs:simpleType> <xs:element name="SetPredicate"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:group ref="STRING-ARRAY"/> </xs:sequence> <xs:attribute name="id" type="ELEMENT-ID" use="required"/> <xs:attribute name="field" type="FIELD-NAME" use="required"/> <xs:attribute name="operator" type="xs:string" fixed="supersetOf"/> </xs:complexType> </xs:element> SetPredicate elements consist of a boolean expression. This is made up of a field, a comparison operator, and a value. The value(s) will be written in the form of an array. Attribute description:
Note that a SetPredicate compares two sets while a SimpleSetPredicte (as defined in the tree model) checks membership of a single value in a set. Delimiter & Time<xs:simpleType name="DELIMITER"> <xs:restriction base="xs:string"> <xs:enumeration value="sameTimeWindow"/> <xs:enumeration value="acrossTimeWindows"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="GAP"> <xs:restriction base="xs:string"> <xs:enumeration value="true"/> <xs:enumeration value="false"/> <xs:enumeration value="unknown"/> </xs:restriction> </xs:simpleType> <xs:element name="Delimiter"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="delimiter" type="DELIMITER" use="required"/> <xs:attribute name="gap" type="GAP" use="required"/> </xs:complexType> </xs:element> Delimiter is the separation between two Itemsets in a Sequence, or between two Sequences in a SequenceRule. Attribute description:
<xs:element name="Time"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="min" type="NUMBER"/> <xs:attribute name="max" type="NUMBER"/> <xs:attribute name="mean" type="NUMBER"/> <xs:attribute name="standardDeviation" type="NUMBER"/> </xs:complexType> </xs:element> Time is only statistics for information, not implying any constraints. The following attributes apply either to Itemsets in a Sequence or AntecedentSequence and ConsequentSequence:
Sequence<xs:group name="FOLLOW-SET"> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="Delimiter"/> <xs:element ref="Time" minOccurs="0"/> <xs:element ref="SetReference"/> </xs:sequence> </xs:group> <xs:element name="Sequence"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="SetReference"/> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:group ref="FOLLOW-SET"/> </xs:sequence> <xs:element ref="Time" minOccurs="0"/> </xs:sequence> <xs:attribute name="id" type="ELEMENT-ID" use="required"/> <xs:attribute name="numberOfSets" type="xs:nonNegativeInteger"/> <xs:attribute name="occurrence" type="INT-NUMBER"/> <xs:attribute name="support" type="REAL-NUMBER"/> </xs:complexType> </xs:element> Each Sequence mainly consists of a SetReference. The Time element between Delimiter and SetReference gives statistics on the elapsed time between each Itemset. The Time element after the final SetReference gives statistics on the total elapsed time from the first to the last Itemset in the Sequence. Attribute description:
<xs:element name="SetReference"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="setId" type="ELEMENT-ID" use="required"/> </xs:complexType> </xs:element> The SetReference refers (or points) to a previously defined set. That set will be either a SetPredicate or an Itemset (which will contain ItemRef elements). Attribute description:
Sequence Rules<xs:element name="SequenceRule"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="AntecedentSequence"/> <xs:element ref="Delimiter"/> <xs:element ref="Time" minOccurs="0"/> <xs:element ref="ConsequentSequence"/> <xs:element ref="Time" minOccurs="0"/> </xs:sequence> <xs:attribute name="id" type="ELEMENT-ID" use="required"/> <xs:attribute name="numberOfSets" type="xs:nonNegativeInteger" use="required"/> <xs:attribute name="occurrence" type="INT-NUMBER" use="required"/> <xs:attribute name="support" type="REAL-NUMBER" use="required"/> <xs:attribute name="confidence" type="REAL-NUMBER" use="required"/> <xs:attribute name="lift" type="REAL-NUMBER"/> </xs:complexType> </xs:element> A Sequence Rule consists of an antecedent Sequence and a consequent Sequence, separated by a Delimiter. The Time element between AntecedentSequence and ConsequentSequence gives statistics on the elapsed time between the antecedent and the consequence, while the Time element after ConsequentSequence gives statistics on the total elapsed time from the first to the last Itemset in the sequence rule. Attribute description:
Note: Compared to the formula for the lift of an AssociationRule, there is an additional correction factor binomialCoefficient( nA+nC, nC ). This factor accounts for the fact that there are binomialCoefficient( nA+nC, nC ) different possibilities for the time order in which the antecedent and the head sequence can be realized in a transaction group, and only one of them contributes to the support of the sequence, namely the time order in which the first consequent item set occurs after the last antecedent item set. Antecedent and Consequent Sequences<xs:group name="SEQUENCE"> <xs:sequence> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:element ref="SequenceReference"/> <xs:element ref="Time" minOccurs="0"/> </xs:sequence> </xs:group> <xs:element name="SequenceReference"> <xs:complexType> <xs:sequence> <xs:element ref="Extension" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute name="seqId" type="ELEMENT-ID" use="required"/> </xs:complexType> </xs:element> <xs:element name="AntecedentSequence"> <xs:complexType> <xs:sequence> <xs:group ref="SEQUENCE"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="ConsequentSequence"> <xs:complexType> <xs:sequence> <xs:group ref="SEQUENCE"/> </xs:sequence> </xs:complexType> </xs:element> Attribute description:
Example<PMML xmlns="https://www.dmg.org/PMML-4_4" version="4.4"> <Header copyright="DMG.org"/> <DataDictionary numberOfFields="5"> <DataField name="CUSTOMER_ID" displayName="CUSTOMER_ID" optype="categorical" dataType="integer"/> <DataField name="TRANSDATE" displayName="TRANSDATE" optype="continuous" dataType="dateDaysSince[0]"/> <DataField name="ITEMID" displayName="ITEMID" optype="categorical" dataType="string"/> <DataField name="STOREID" displayName="STOREID" optype="categorical" dataType="string"/> <DataField name="TRANSID" displayName="TRANSID" optype="categorical" dataType="string"/> </DataDictionary> <SequenceModel functionName="sequences" numberOfTransactions="175"> <MiningSchema> <MiningField name="CUSTOMER_ID" usageType="group"/> <MiningField name="TRANSDATE" usageType="order"/> <MiningField name="ITEMID"/> <MiningField name="STOREID"/> <MiningField name="TRANSID"/> </MiningSchema> <Constraints minimumSupport="0.2" minimumConfidence="0.5"/> <Item id="0" value="177" mappedValue="Cognac"/> <Item id="1" value="129" mappedValue="Cream"/> <Item id="2" value="144" mappedValue="Tonic water"/> <Item id="3" value="174" mappedValue="Vodka"/> <Item id="4" value="108" mappedValue="Cider"/> <Item id="5" value="172" mappedValue="Scotch Whisky"/> <Item id="6" value="130" mappedValue="Root Beer"/> <Itemset id="0" support="0.0628571428571429" numberOfItems="1"> <ItemRef itemRef="0"/> </Itemset> <Itemset id="1" support="0.24" numberOfItems="2"> <ItemRef itemRef="1"/> <ItemRef itemRef="2"/> </Itemset> <Itemset id="2" support="0.0628571428571429" numberOfItems="3"> <ItemRef itemRef="3"/> <ItemRef itemRef="4"/> <ItemRef itemRef="5"/> </Itemset> <Itemset id="3" support="0.0628571428571429" numberOfItems="1"> <ItemRef itemRef="6"/> </Itemset> <Sequence id="0" numberOfSets="1" occurrence="5" support="0.02"> <SetReference setId="0"/> </Sequence> <Sequence id="1" numberOfSets="2" occurrence="6" support="0.25"> <SetReference setId="0"/> <Delimiter delimiter="acrossTimeWindows" gap="unknown"/> <SetReference setId="2"/> </Sequence> <Sequence id="2" numberOfSets="1" occurrence="5" support="0.45"> <SetReference setId="1"/> </Sequence> <Sequence id="3" numberOfSets="1" occurrence="15" support="0.2"> <SetReference setId="3"/> </Sequence> <SequenceRule id="0" numberOfSets="2" occurrence="5" support="0.20833" confidence="0.55556"> <AntecedentSequence> <SequenceReference seqId="0"/> </AntecedentSequence> <Delimiter delimiter="acrossTimeWindows" gap="unknown"/> <Time min="5" max="8" mean="6.8"/> <ConsequentSequence> <SequenceReference seqId="2"/> </ConsequentSequence> </SequenceRule> <SequenceRule id="1" numberOfSets="2" occurrence="6" support="0.25" confidence="0.66667"> <AntecedentSequence> <SequenceReference seqId="1"/> </AntecedentSequence> <Delimiter delimiter="acrossTimeWindows" gap="unknown"/> <Time min="2" max="8" mean="6.16667"/> <ConsequentSequence> <SequenceReference seqId="3"/> </ConsequentSequence> </SequenceRule> <SequenceRule id="2" numberOfSets="2" occurrence="5" support="0.20833" confidence="0.55556"> <AntecedentSequence> <SequenceReference seqId="2"/> </AntecedentSequence> <Delimiter delimiter="acrossTimeWindows" gap="unknown"/> <Time min="2" max="8" mean="6.6"/> <ConsequentSequence> <SequenceReference seqId="3"/> </ConsequentSequence> </SequenceRule> <SequenceRule id="3" numberOfSets="2" occurrence="14" support="0.58333" confidence="0.73684"> <AntecedentSequence> <SequenceReference seqId="3"/> </AntecedentSequence> <Delimiter delimiter="acrossTimeWindows" gap="unknown"/> <Time min="1" max="10" mean="6.14286"/> <ConsequentSequence> <SequenceReference seqId="0"/> </ConsequentSequence> </SequenceRule> </SequenceModel> </PMML> |
||||||||||||||
|